Philip Reames via llvm-dev
2015-Aug-31 22:21 UTC
[llvm-dev] MCRegisterClass mandatory vs preferred alignment?
Looking around today, it appears that TargetRegisterClass and MCRegisterClass only includes a single alignment. This is documented as being the minimum legal alignment, but it appears to often be greater than this in practice. For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. Does anyone know why this is? Additionally, where are these alignments actually defined? I don't seem them appearing in the X86RegisterInfo.td files as I would naively expect. The background for my question is that I'm looking into adding a function attribute which uses unaligned loads and stores for register spilling on x86 to avoid the need for dynamic frame realignment. (see the previous thread "Aligned vector spills and variably sized stack frames") The key difference w.r.t. to the existing "no-realign-stack" attribute is that situations which *require* a stack realignment will generate a fatal_error rather than silently miscompiling. The current mechanism works by essentially ignoring the alignment criteria and just hoping everything works out in practice. Philip
Matthias Braun via llvm-dev
2015-Aug-31 22:59 UTC
[llvm-dev] MCRegisterClass mandatory vs preferred alignment?
Looks to me like the alignment is specified in tablegen. From Target.td: class RegisterClass<string namespace, list<ValueType> regTypes, int alignment, dag regList, RegAltNameIndex idx = NoRegAltName> X86RegisterInfo.td: def VR256 : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], 256, (sequence "YMM%u", 0, 15)>; def VR256X : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], 256, (sequence "YMM%u", 0, 31)>; Seems to be 256bits/32bytes. I don't know why the alignment was specified the way it is. My guess would be because memory accesses are faster that way (because they do not cross cache lines for example). - Matthias> On Aug 31, 2015, at 3:21 PM, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Looking around today, it appears that TargetRegisterClass and MCRegisterClass only includes a single alignment. This is documented as being the minimum legal alignment, but it appears to often be greater than this in practice. For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. Does anyone know why this is? > > Additionally, where are these alignments actually defined? I don't seem them appearing in the X86RegisterInfo.td files as I would naively expect. > > The background for my question is that I'm looking into adding a function attribute which uses unaligned loads and stores for register spilling on x86 to avoid the need for dynamic frame realignment. (see the previous thread "Aligned vector spills and variably sized stack frames") The key difference w.r.t. to the existing "no-realign-stack" attribute is that situations which *require* a stack realignment will generate a fatal_error rather than silently miscompiling. The current mechanism works by essentially ignoring the alignment criteria and just hoping everything works out in practice. > > Philip > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Philip Reames via llvm-dev
2015-Aug-31 23:15 UTC
[llvm-dev] MCRegisterClass mandatory vs preferred alignment?
On 08/31/2015 03:59 PM, Matthias Braun wrote:> Looks to me like the alignment is specified in tablegen. From Target.td: > > class RegisterClass<string namespace, list<ValueType> regTypes, int alignment, > dag regList, RegAltNameIndex idx = NoRegAltName> > > X86RegisterInfo.td: > > def VR256 : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], > 256, (sequence "YMM%u", 0, 15)>; > def VR256X : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], > 256, (sequence "YMM%u", 0, 31)>; > > Seems to be 256bits/32bytes.Yeah, don't know how I missed that. :)> > I don't know why the alignment was specified the way it is. My guess would be because memory accesses are faster that way (because they do not cross cache lines for example).This is certainly true on older cores, but is actually true on newer ones? Looking through Agner's instruction tables, it looks like the aligned and unaligned versions are essentially the same on newer intels and amds. I was originally imagining that I'd need a custom hook or flag, but would it make sense to just use the unaligned versions if the appropriate feature flag (IsUAMem32Slow) is unset? This would result in slightly smaller code on newer architectures without (seemingly, I have no direct experience here) a performance hit.> > - Matthias > >> On Aug 31, 2015, at 3:21 PM, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Looking around today, it appears that TargetRegisterClass and MCRegisterClass only includes a single alignment. This is documented as being the minimum legal alignment, but it appears to often be greater than this in practice. For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. Does anyone know why this is? >> >> Additionally, where are these alignments actually defined? I don't seem them appearing in the X86RegisterInfo.td files as I would naively expect. >> >> The background for my question is that I'm looking into adding a function attribute which uses unaligned loads and stores for register spilling on x86 to avoid the need for dynamic frame realignment. (see the previous thread "Aligned vector spills and variably sized stack frames") The key difference w.r.t. to the existing "no-realign-stack" attribute is that situations which *require* a stack realignment will generate a fatal_error rather than silently miscompiling. The current mechanism works by essentially ignoring the alignment criteria and just hoping everything works out in practice. >> >> Philip >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
David Chisnall via llvm-dev
2015-Sep-01 09:28 UTC
[llvm-dev] MCRegisterClass mandatory vs preferred alignment?
On 31 Aug 2015, at 23:21, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Looking around today, it appears that TargetRegisterClass and MCRegisterClass only includes a single alignment. This is documented as being the minimum legal alignment, but it appears to often be greater than this in practice. For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. Does anyone know why this is? > > Additionally, where are these alignments actually defined? I don't seem them appearing in the X86RegisterInfo.td files as I would naively expect. > > The background for my question is that I'm looking into adding a function attribute which uses unaligned loads and stores for register spilling on x86 to avoid the need for dynamic frame realignment. (see the previous thread "Aligned vector spills and variably sized stack frames") The key difference w.r.t. to the existing "no-realign-stack" attribute is that situations which *require* a stack realignment will generate a fatal_error rather than silently miscompiling. The current mechanism works by essentially ignoring the alignment criteria and just hoping everything works out in practice.The alignment is a property of the register as a side-effect of conflating three things: - Registers - Types - Load and store operations A register, intrinsically, should have no alignment - alignment is solely a property of values in memory and therefore should be a property of instructions. Currently, there’s no infrastructure for providing different load and store instructions with different alignment requirements (other than writing a custom pass). There was some work about a year ago to try to deconflate these things, but I don’t think it ever made it into the tree. David
Seemingly Similar Threads
- MCRegisterClass mandatory vs preferred alignment?
- [LLVMdev] AVX spill alignment
- [LLVMdev] AVX spill alignment
- [LoopVectorizer] Improving the performance of dot product reduction loop
- [LoopVectorizer] Improving the performance of dot product reduction loop