thr3ads.net - llvm dev - [LLVMdev] X86 sub_ss and sub

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2012-Jul-26 17:43 UTC

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

On Jul 26, 2012, at 10:28 AM, dag at cray.com wrote:
> Jakob Stoklund Olesen <jolesen at apple.com> writes:
> 
>>> What happens if the result of the above pattern using
COPY_TO_REGCLASS
>>> is spilled?  Will we get a 64-bit store or a 128-bit store?
>> 
>> This behavior isn't affected by the change. FR64 registers are
spilled
>> with 64-bit stores, and VR128 registers are spilled with 128-bit
>> stores.
>> 
>> When the register coalescer removes a copy between VR128 and FR64
>> registers, it chooses the larger spill size for the result. This is
>> the same for sub-register copies and full register copies.
> 
> So if I understand this correctly, a pattern like this:
> 
>  def : Pat<(f64 (vector_extract (v2f64 VR128:$src), (iPTR 0))),
>            (f64 (EXTRACT_SUBREG (v2f64 VR128:$src), sub_sd))>;
> 
> will currently use a 128-bit store if it is spilled?
It will if we coalesce the COPY away, yes.

None of this is dependent on our using sub-registers, though. The coalescer
treats sub-register copies and full register copies equally.
> If the 128-bit register is not ever used as a 128-bit register,
> shouldn't the coalescer pick the 64- or 32-bit register?
That optimization is not currently implemented for sub-registers. For example,
if you create a GR64 virtual register and only ever use the sub_32bit
sub-register, it would be possible to replace the virtual register with a GR32
register. It's not impossible to do, but it doesn't come up a lot.

When not using sub-registers, the optimization does exist. For example, if you
have a VR128 virtual register, but all the instructions using it only require
FR32, MRI->recomputeRegClass() will figure it out, and downgrade to FR32.

It gets permission to do this because
X86RegisterInfo::getLargestLegalSuperClass(VR128) returns FR32.

/jakob

dag at cray.com

2012-Jul-26 18:16 UTC

head link

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

Jakob Stoklund Olesen <jolesen at apple.com> writes:
>> If the 128-bit register is not ever used as a 128-bit register,
>> shouldn't the coalescer pick the 64- or 32-bit register?
>
> That optimization is not currently implemented for sub-registers. For
> example, if you create a GR64 virtual register and only ever use the
> sub_32bit sub-register, it would be possible to replace the virtual
> register with a GR32 register. It's not impossible to do, but it
> doesn't come up a lot.
It does come up a lot in vector code.  Extraction of scalar values from
vectors is pretty common, especially given the limitations of SSE/AVX.
Typically we have done this using EXTRACT_SUBREG.  So either we would
have to prevent coalescing to avoid a 128-bit spill or we would always
have to use a 128-bit spill even if we never use anything but the scalar
value.

Neither option is a good one.
> When not using sub-registers, the optimization does exist. For
> example, if you have a VR128 virtual register, but all the
> instructions using it only require FR32, MRI->recomputeRegClass() will
> figure it out, and downgrade to FR32.
I don't think this optimization applies because the SSE/AVX instruction
defines a vector register but we never use the upper elements.

Would adding Fs patterns for these cases, forcing the result register to
FR64, help?

What does Fs mean anyway, "fake scalar?"  :)

                                -Dave

Jakob Stoklund Olesen

2012-Jul-26 19:50 UTC

head link

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

On Jul 26, 2012, at 11:16 AM, dag at cray.com wrote:
> Jakob Stoklund Olesen <jolesen at apple.com> writes:
> 
>>> If the 128-bit register is not ever used as a 128-bit register,
>>> shouldn't the coalescer pick the 64- or 32-bit register?
>> 
>> That optimization is not currently implemented for sub-registers. For
>> example, if you create a GR64 virtual register and only ever use the
>> sub_32bit sub-register, it would be possible to replace the virtual
>> register with a GR32 register. It's not impossible to do, but it
>> doesn't come up a lot.
> 
> It does come up a lot in vector code.  Extraction of scalar values from
> vectors is pretty common, especially given the limitations of SSE/AVX.
> Typically we have done this using EXTRACT_SUBREG.  So either we would
> have to prevent coalescing to avoid a 128-bit spill or we would always
> have to use a 128-bit spill even if we never use anything but the scalar
> value.
> 
> Neither option is a good one.
If you feel this is important, please file a PR with a test case where it
matters. It is orthogonal to the topic of this thread.

/jakob

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Jul 2012 - [LLVMdev] X86 sub_ss and sub_sd sub-register indexes

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

Apparently Analagous Threads