Jakob Stoklund Olesen
2012-Jul-26 15:02 UTC
[LLVMdev] X86 sub_ss and sub_sd sub-register indexes
All, I've been trying to simplify the way LLVM models sub-register relationships a bit, and the X86 sub_ss and sub_sd sub-register indices are getting in the way. I want to get rid of them. These sub-registers are special, they are only mentioned here: let CompositeIndices = [(sub_ss), (sub_sd)] in { def XMM0: Register<"xmm0">, DwarfRegNum<[17, 21, 21]>; def XMM1: Register<"xmm1">, DwarfRegNum<[18, 22, 22]>; ... This secret syntax means that the indexes are idempotent: getSubReg(YMM0, sub_ss) --> XMM0 getSubReg(XMM0, sub_ss) --> XMM0 They are supposed to represent the 32-bit and 64-bit low parts of the xmm registers, but since we don't define explicit registers for those sub-registers, we are left with idempotent sub-register indexes. We have three different register classes for the xmm registers: FR32, FR64, and VR128. The sub_ss and sub_sd indexes used to play a role in selecting the right register class, but not any longer. That is all derived from the instruction descriptions now. As far as I can tell, all sub-register operations involving sub_ss and sub_sd can simply be replaced with COPY_TO_REGCLASS: def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), (VMOVSDrr VR128:$src1, (EXTRACT_SUBREG (v4i32 VR128:$src2), sub_sd))>; Becomes: def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), (VMOVSDrr VR128:$src1, (COPY_TO_REGCLASS VR128:$src2, FR64))>; By eliminating these indexes, I can remove the 'CompositeIndices' syntax and TableGen's handling of loops in the sub-register graph. I can assert that every sub-register has a unique name, and that can be used to compress tables a bit more. /jakob
dag at cray.com
2012-Jul-26 16:43 UTC
[LLVMdev] X86 sub_ss and sub_sd sub-register indexes
Jakob Stoklund Olesen <jolesen at apple.com> writes:> These sub-registers are special, they are only mentioned here: > > let CompositeIndices = [(sub_ss), (sub_sd)] in { > def XMM0: Register<"xmm0">, DwarfRegNum<[17, 21, 21]>; > def XMM1: Register<"xmm1">, DwarfRegNum<[18, 22, 22]>; > ...I'm confused. Below you note that they are used in patterns, so they are certainly mentioned more than just in the code above.> As far as I can tell, all sub-register operations involving sub_ss and > sub_sd can simply be replaced with COPY_TO_REGCLASS: > > def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), > (VMOVSDrr VR128:$src1, (EXTRACT_SUBREG (v4i32 VR128:$src2), > sub_sd))>; > > Becomes: > > def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), > (VMOVSDrr VR128:$src1, (COPY_TO_REGCLASS VR128:$src2, FR64))>;A few questions: Will COPY_TO_REGCLASS actually generate a copy instruction or can TableGen/isel fold it away? What happens if the result of the above pattern using COPY_TO_REGCLASS is spilled? Will we get a 64-bit store or a 128-bit store? -Dave
Jakob Stoklund Olesen
2012-Jul-26 17:04 UTC
[LLVMdev] X86 sub_ss and sub_sd sub-register indexes
On Jul 26, 2012, at 9:43 AM, dag at cray.com wrote:> Jakob Stoklund Olesen <jolesen at apple.com> writes: > >> As far as I can tell, all sub-register operations involving sub_ss and >> sub_sd can simply be replaced with COPY_TO_REGCLASS: >> >> def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), >> (VMOVSDrr VR128:$src1, (EXTRACT_SUBREG (v4i32 VR128:$src2), >> sub_sd))>; >> >> Becomes: >> >> def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), >> (VMOVSDrr VR128:$src1, (COPY_TO_REGCLASS VR128:$src2, FR64))>; > > A few questions: > > Will COPY_TO_REGCLASS actually generate a copy instruction or can > TableGen/isel fold it away?Both EXTRACT_SUBREG and COPY_TO_REGCLASS are emitted as COPY instructions by InstrEmitter. One as a sub-register copy, one as a full register copy. Both are handled by the register coalescer. It would actually be possible to have EmitCopyToRegClassNode() try to call MRI->constrainRegClass() first, just like AddRegisterOperand() does. That could avoid the copy in some cases, and you would simply get a VR128 register as the second VMOVSDrr operand. I am not proposing we do that for now. Let the register coalescer deal with that.> What happens if the result of the above pattern using COPY_TO_REGCLASS > is spilled? Will we get a 64-bit store or a 128-bit store?This behavior isn't affected by the change. FR64 registers are spilled with 64-bit stores, and VR128 registers are spilled with 128-bit stores. When the register coalescer removes a copy between VR128 and FR64 registers, it chooses the larger spill size for the result. This is the same for sub-register copies and full register copies. The important point here is that VR128 is a sub-class of FR64, so getCommonSubClass(VR128, FR64) -> VR128. This is the Liskov substitution principle for register classes. /jakob