Jakob Stoklund Olesen
2012-Jul-26 17:04 UTC
[LLVMdev] X86 sub_ss and sub_sd sub-register indexes
On Jul 26, 2012, at 9:43 AM, dag at cray.com wrote:> Jakob Stoklund Olesen <jolesen at apple.com> writes: > >> As far as I can tell, all sub-register operations involving sub_ss and >> sub_sd can simply be replaced with COPY_TO_REGCLASS: >> >> def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), >> (VMOVSDrr VR128:$src1, (EXTRACT_SUBREG (v4i32 VR128:$src2), >> sub_sd))>; >> >> Becomes: >> >> def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), >> (VMOVSDrr VR128:$src1, (COPY_TO_REGCLASS VR128:$src2, FR64))>; > > A few questions: > > Will COPY_TO_REGCLASS actually generate a copy instruction or can > TableGen/isel fold it away?Both EXTRACT_SUBREG and COPY_TO_REGCLASS are emitted as COPY instructions by InstrEmitter. One as a sub-register copy, one as a full register copy. Both are handled by the register coalescer. It would actually be possible to have EmitCopyToRegClassNode() try to call MRI->constrainRegClass() first, just like AddRegisterOperand() does. That could avoid the copy in some cases, and you would simply get a VR128 register as the second VMOVSDrr operand. I am not proposing we do that for now. Let the register coalescer deal with that.> What happens if the result of the above pattern using COPY_TO_REGCLASS > is spilled? Will we get a 64-bit store or a 128-bit store?This behavior isn't affected by the change. FR64 registers are spilled with 64-bit stores, and VR128 registers are spilled with 128-bit stores. When the register coalescer removes a copy between VR128 and FR64 registers, it chooses the larger spill size for the result. This is the same for sub-register copies and full register copies. The important point here is that VR128 is a sub-class of FR64, so getCommonSubClass(VR128, FR64) -> VR128. This is the Liskov substitution principle for register classes. /jakob
dag at cray.com
2012-Jul-26 17:28 UTC
[LLVMdev] X86 sub_ss and sub_sd sub-register indexes
Jakob Stoklund Olesen <jolesen at apple.com> writes:>> What happens if the result of the above pattern using COPY_TO_REGCLASS >> is spilled? Will we get a 64-bit store or a 128-bit store? > > This behavior isn't affected by the change. FR64 registers are spilled > with 64-bit stores, and VR128 registers are spilled with 128-bit > stores. > > When the register coalescer removes a copy between VR128 and FR64 > registers, it chooses the larger spill size for the result. This is > the same for sub-register copies and full register copies.So if I understand this correctly, a pattern like this: def : Pat<(f64 (vector_extract (v2f64 VR128:$src), (iPTR 0))), (f64 (EXTRACT_SUBREG (v2f64 VR128:$src), sub_sd))>; will currently use a 128-bit store if it is spilled? That's really not good. If the 128-bit register is not ever used as a 128-bit register, shouldn't the coalescer pick the 64- or 32-bit register? -Dave
Jakob Stoklund Olesen
2012-Jul-26 17:43 UTC
[LLVMdev] X86 sub_ss and sub_sd sub-register indexes
On Jul 26, 2012, at 10:28 AM, dag at cray.com wrote:> Jakob Stoklund Olesen <jolesen at apple.com> writes: > >>> What happens if the result of the above pattern using COPY_TO_REGCLASS >>> is spilled? Will we get a 64-bit store or a 128-bit store? >> >> This behavior isn't affected by the change. FR64 registers are spilled >> with 64-bit stores, and VR128 registers are spilled with 128-bit >> stores. >> >> When the register coalescer removes a copy between VR128 and FR64 >> registers, it chooses the larger spill size for the result. This is >> the same for sub-register copies and full register copies. > > So if I understand this correctly, a pattern like this: > > def : Pat<(f64 (vector_extract (v2f64 VR128:$src), (iPTR 0))), > (f64 (EXTRACT_SUBREG (v2f64 VR128:$src), sub_sd))>; > > will currently use a 128-bit store if it is spilled?It will if we coalesce the COPY away, yes. None of this is dependent on our using sub-registers, though. The coalescer treats sub-register copies and full register copies equally.> If the 128-bit register is not ever used as a 128-bit register, > shouldn't the coalescer pick the 64- or 32-bit register?That optimization is not currently implemented for sub-registers. For example, if you create a GR64 virtual register and only ever use the sub_32bit sub-register, it would be possible to replace the virtual register with a GR32 register. It's not impossible to do, but it doesn't come up a lot. When not using sub-registers, the optimization does exist. For example, if you have a VR128 virtual register, but all the instructions using it only require FR32, MRI->recomputeRegClass() will figure it out, and downgrade to FR32. It gets permission to do this because X86RegisterInfo::getLargestLegalSuperClass(VR128) returns FR32. /jakob
Possibly Parallel Threads
- [LLVMdev] X86 sub_ss and sub_sd sub-register indexes
- [LLVMdev] X86 sub_ss and sub_sd sub-register indexes
- [LLVMdev] X86 sub_ss and sub_sd sub-register indexes
- [LLVMdev] X86 sub_ss and sub_sd sub-register indexes
- [LLVMdev] X86 sub_ss and sub_sd sub-register indexes