thr3ads.net - llvm dev - [LLVMdev] X86 sub_ss and sub

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2012-Jul-26 15:02 UTC

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

All,

I've been trying to simplify the way LLVM models sub-register relationships
a bit, and the X86 sub_ss and sub_sd sub-register indices are getting in the
way. I want to get rid of them.

These sub-registers are special, they are only mentioned here:

  let CompositeIndices = [(sub_ss), (sub_sd)] in {
  def XMM0: Register<"xmm0">, DwarfRegNum<[17, 21, 21]>;
  def XMM1: Register<"xmm1">, DwarfRegNum<[18, 22, 22]>;
  ...

This secret syntax means that the indexes are idempotent:

  getSubReg(YMM0, sub_ss) --> XMM0
  getSubReg(XMM0, sub_ss) --> XMM0

They are supposed to represent the 32-bit and 64-bit low parts of the xmm
registers, but since we don't define explicit registers for those
sub-registers, we are left with idempotent sub-register indexes.

We have three different register classes for the xmm registers: FR32, FR64, and
VR128. The sub_ss and sub_sd indexes used to play a role in selecting the right
register class, but not any longer. That is all derived from the instruction
descriptions now.

As far as I can tell, all sub-register operations involving sub_ss and sub_sd
can simply be replaced with COPY_TO_REGCLASS:

  def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)),
            (VMOVSDrr VR128:$src1, (EXTRACT_SUBREG (v4i32 VR128:$src2),
                                                   sub_sd))>;

Becomes:

  def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)),
            (VMOVSDrr VR128:$src1, (COPY_TO_REGCLASS VR128:$src2, FR64))>;

By eliminating these indexes, I can remove the 'CompositeIndices' syntax
and TableGen's handling of loops in the sub-register graph. I can assert
that every sub-register has a unique name, and that can be used to compress
tables a bit more.

/jakob

dag at cray.com

2012-Jul-26 16:43 UTC

head link

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

Jakob Stoklund Olesen <jolesen at apple.com> writes:
> These sub-registers are special, they are only mentioned here:
>
>   let CompositeIndices = [(sub_ss), (sub_sd)] in {
>   def XMM0: Register<"xmm0">, DwarfRegNum<[17, 21,
21]>;
>   def XMM1: Register<"xmm1">, DwarfRegNum<[18, 22,
22]>;
>   ...
I'm confused.  Below you note that they are used in patterns, so they
are certainly mentioned more than just in the code above.
> As far as I can tell, all sub-register operations involving sub_ss and
> sub_sd can simply be replaced with COPY_TO_REGCLASS:
>
>   def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)),
>             (VMOVSDrr VR128:$src1, (EXTRACT_SUBREG (v4i32 VR128:$src2),
>                                                    sub_sd))>;
>
> Becomes:
>
>   def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)),
>             (VMOVSDrr VR128:$src1, (COPY_TO_REGCLASS VR128:$src2,
FR64))>;
A few questions:

Will COPY_TO_REGCLASS actually generate a copy instruction or can
TableGen/isel fold it away?

What happens if the result of the above pattern using COPY_TO_REGCLASS
is spilled?  Will we get a 64-bit store or a 128-bit store?

                                -Dave

Jakob Stoklund Olesen

2012-Jul-26 17:04 UTC

head link

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

On Jul 26, 2012, at 9:43 AM, dag at cray.com wrote:
> Jakob Stoklund Olesen <jolesen at apple.com> writes:
> 
>> As far as I can tell, all sub-register operations involving sub_ss and
>> sub_sd can simply be replaced with COPY_TO_REGCLASS:
>> 
>>  def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)),
>>            (VMOVSDrr VR128:$src1, (EXTRACT_SUBREG (v4i32 VR128:$src2),
>>                                                   sub_sd))>;
>> 
>> Becomes:
>> 
>>  def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)),
>>            (VMOVSDrr VR128:$src1, (COPY_TO_REGCLASS VR128:$src2,
FR64))>;
> 
> A few questions:
> 
> Will COPY_TO_REGCLASS actually generate a copy instruction or can
> TableGen/isel fold it away?
Both EXTRACT_SUBREG and COPY_TO_REGCLASS are emitted as COPY instructions by
InstrEmitter. One as a sub-register copy, one as a full register copy. Both are
handled by the register coalescer.

It would actually be possible to have EmitCopyToRegClassNode() try to call
MRI->constrainRegClass() first, just like AddRegisterOperand() does. That
could avoid the copy in some cases, and you would simply get a VR128 register as
the second VMOVSDrr operand. I am not proposing we do that for now. Let the
register coalescer deal with that.
> What happens if the result of the above pattern using COPY_TO_REGCLASS
> is spilled?  Will we get a 64-bit store or a 128-bit store?
This behavior isn't affected by the change. FR64 registers are spilled with
64-bit stores, and VR128 registers are spilled with 128-bit stores.

When the register coalescer removes a copy between VR128 and FR64 registers, it
chooses the larger spill size for the result. This is the same for sub-register
copies and full register copies.

The important point here is that VR128 is a sub-class of FR64, so
getCommonSubClass(VR128, FR64) -> VR128. This is the Liskov substitution
principle for register classes.

/jakob

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Jul 2012 - [LLVMdev] X86 sub_ss and sub_sd sub-register indexes

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

Reasonably Related Threads