Jakob Stoklund Olesen
2012-Jul-26 17:43 UTC
[LLVMdev] X86 sub_ss and sub_sd sub-register indexes
On Jul 26, 2012, at 10:28 AM, dag at cray.com wrote:> Jakob Stoklund Olesen <jolesen at apple.com> writes: > >>> What happens if the result of the above pattern using COPY_TO_REGCLASS >>> is spilled? Will we get a 64-bit store or a 128-bit store? >> >> This behavior isn't affected by the change. FR64 registers are spilled >> with 64-bit stores, and VR128 registers are spilled with 128-bit >> stores. >> >> When the register coalescer removes a copy between VR128 and FR64 >> registers, it chooses the larger spill size for the result. This is >> the same for sub-register copies and full register copies. > > So if I understand this correctly, a pattern like this: > > def : Pat<(f64 (vector_extract (v2f64 VR128:$src), (iPTR 0))), > (f64 (EXTRACT_SUBREG (v2f64 VR128:$src), sub_sd))>; > > will currently use a 128-bit store if it is spilled?It will if we coalesce the COPY away, yes. None of this is dependent on our using sub-registers, though. The coalescer treats sub-register copies and full register copies equally.> If the 128-bit register is not ever used as a 128-bit register, > shouldn't the coalescer pick the 64- or 32-bit register?That optimization is not currently implemented for sub-registers. For example, if you create a GR64 virtual register and only ever use the sub_32bit sub-register, it would be possible to replace the virtual register with a GR32 register. It's not impossible to do, but it doesn't come up a lot. When not using sub-registers, the optimization does exist. For example, if you have a VR128 virtual register, but all the instructions using it only require FR32, MRI->recomputeRegClass() will figure it out, and downgrade to FR32. It gets permission to do this because X86RegisterInfo::getLargestLegalSuperClass(VR128) returns FR32. /jakob
dag at cray.com
2012-Jul-26 18:16 UTC
[LLVMdev] X86 sub_ss and sub_sd sub-register indexes
Jakob Stoklund Olesen <jolesen at apple.com> writes:>> If the 128-bit register is not ever used as a 128-bit register, >> shouldn't the coalescer pick the 64- or 32-bit register? > > That optimization is not currently implemented for sub-registers. For > example, if you create a GR64 virtual register and only ever use the > sub_32bit sub-register, it would be possible to replace the virtual > register with a GR32 register. It's not impossible to do, but it > doesn't come up a lot.It does come up a lot in vector code. Extraction of scalar values from vectors is pretty common, especially given the limitations of SSE/AVX. Typically we have done this using EXTRACT_SUBREG. So either we would have to prevent coalescing to avoid a 128-bit spill or we would always have to use a 128-bit spill even if we never use anything but the scalar value. Neither option is a good one.> When not using sub-registers, the optimization does exist. For > example, if you have a VR128 virtual register, but all the > instructions using it only require FR32, MRI->recomputeRegClass() will > figure it out, and downgrade to FR32.I don't think this optimization applies because the SSE/AVX instruction defines a vector register but we never use the upper elements. Would adding Fs patterns for these cases, forcing the result register to FR64, help? What does Fs mean anyway, "fake scalar?" :) -Dave
Jakob Stoklund Olesen
2012-Jul-26 19:50 UTC
[LLVMdev] X86 sub_ss and sub_sd sub-register indexes
On Jul 26, 2012, at 11:16 AM, dag at cray.com wrote:> Jakob Stoklund Olesen <jolesen at apple.com> writes: > >>> If the 128-bit register is not ever used as a 128-bit register, >>> shouldn't the coalescer pick the 64- or 32-bit register? >> >> That optimization is not currently implemented for sub-registers. For >> example, if you create a GR64 virtual register and only ever use the >> sub_32bit sub-register, it would be possible to replace the virtual >> register with a GR32 register. It's not impossible to do, but it >> doesn't come up a lot. > > It does come up a lot in vector code. Extraction of scalar values from > vectors is pretty common, especially given the limitations of SSE/AVX. > Typically we have done this using EXTRACT_SUBREG. So either we would > have to prevent coalescing to avoid a 128-bit spill or we would always > have to use a 128-bit spill even if we never use anything but the scalar > value. > > Neither option is a good one.If you feel this is important, please file a PR with a test case where it matters. It is orthogonal to the topic of this thread. /jakob
Possibly Parallel Threads
- [LLVMdev] X86 sub_ss and sub_sd sub-register indexes
- [LLVMdev] X86 sub_ss and sub_sd sub-register indexes
- [LLVMdev] X86 sub_ss and sub_sd sub-register indexes
- [LLVMdev] X86 sub_ss and sub_sd sub-register indexes
- [LLVMdev] X86 sub_ss and sub_sd sub-register indexes