Philip Reames via llvm-dev
2015-Aug-31 23:15 UTC
[llvm-dev] MCRegisterClass mandatory vs preferred alignment?
On 08/31/2015 03:59 PM, Matthias Braun wrote:> Looks to me like the alignment is specified in tablegen. From Target.td: > > class RegisterClass<string namespace, list<ValueType> regTypes, int alignment, > dag regList, RegAltNameIndex idx = NoRegAltName> > > X86RegisterInfo.td: > > def VR256 : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], > 256, (sequence "YMM%u", 0, 15)>; > def VR256X : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], > 256, (sequence "YMM%u", 0, 31)>; > > Seems to be 256bits/32bytes.Yeah, don't know how I missed that. :)> > I don't know why the alignment was specified the way it is. My guess would be because memory accesses are faster that way (because they do not cross cache lines for example).This is certainly true on older cores, but is actually true on newer ones? Looking through Agner's instruction tables, it looks like the aligned and unaligned versions are essentially the same on newer intels and amds. I was originally imagining that I'd need a custom hook or flag, but would it make sense to just use the unaligned versions if the appropriate feature flag (IsUAMem32Slow) is unset? This would result in slightly smaller code on newer architectures without (seemingly, I have no direct experience here) a performance hit.> > - Matthias > >> On Aug 31, 2015, at 3:21 PM, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Looking around today, it appears that TargetRegisterClass and MCRegisterClass only includes a single alignment. This is documented as being the minimum legal alignment, but it appears to often be greater than this in practice. For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. Does anyone know why this is? >> >> Additionally, where are these alignments actually defined? I don't seem them appearing in the X86RegisterInfo.td files as I would naively expect. >> >> The background for my question is that I'm looking into adding a function attribute which uses unaligned loads and stores for register spilling on x86 to avoid the need for dynamic frame realignment. (see the previous thread "Aligned vector spills and variably sized stack frames") The key difference w.r.t. to the existing "no-realign-stack" attribute is that situations which *require* a stack realignment will generate a fatal_error rather than silently miscompiling. The current mechanism works by essentially ignoring the alignment criteria and just hoping everything works out in practice. >> >> Philip >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Matthias Braun via llvm-dev
2015-Aug-31 23:25 UTC
[llvm-dev] MCRegisterClass mandatory vs preferred alignment?
Would certainly be interesting to perform some benchmarking (llvm-testsuite/spec) to confirm this. I could imagine that a smaller stack footprint improves performance (or at least does not degrade it). - Matthias> On Aug 31, 2015, at 4:15 PM, Philip Reames <listmail at philipreames.com> wrote: > > > > On 08/31/2015 03:59 PM, Matthias Braun wrote: >> Looks to me like the alignment is specified in tablegen. From Target.td: >> >> class RegisterClass<string namespace, list<ValueType> regTypes, int alignment, >> dag regList, RegAltNameIndex idx = NoRegAltName> >> >> X86RegisterInfo.td: >> >> def VR256 : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], >> 256, (sequence "YMM%u", 0, 15)>; >> def VR256X : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], >> 256, (sequence "YMM%u", 0, 31)>; >> >> Seems to be 256bits/32bytes. > Yeah, don't know how I missed that. :) >> >> I don't know why the alignment was specified the way it is. My guess would be because memory accesses are faster that way (because they do not cross cache lines for example). > This is certainly true on older cores, but is actually true on newer ones? Looking through Agner's instruction tables, it looks like the aligned and unaligned versions are essentially the same on newer intels and amds. > > I was originally imagining that I'd need a custom hook or flag, but would it make sense to just use the unaligned versions if the appropriate feature flag (IsUAMem32Slow) is unset? This would result in slightly smaller code on newer architectures without (seemingly, I have no direct experience here) a performance hit. >> >> - Matthias >> >>> On Aug 31, 2015, at 3:21 PM, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>> >>> Looking around today, it appears that TargetRegisterClass and MCRegisterClass only includes a single alignment. This is documented as being the minimum legal alignment, but it appears to often be greater than this in practice. For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. Does anyone know why this is? >>> >>> Additionally, where are these alignments actually defined? I don't seem them appearing in the X86RegisterInfo.td files as I would naively expect. >>> >>> The background for my question is that I'm looking into adding a function attribute which uses unaligned loads and stores for register spilling on x86 to avoid the need for dynamic frame realignment. (see the previous thread "Aligned vector spills and variably sized stack frames") The key difference w.r.t. to the existing "no-realign-stack" attribute is that situations which *require* a stack realignment will generate a fatal_error rather than silently miscompiling. The current mechanism works by essentially ignoring the alignment criteria and just hoping everything works out in practice. >>> >>> Philip >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQIC-g&c=eEvniauFctOgLOKGJOplqw&r=owCLIXjMdMpT1E9Ei7smWg&m=4X-tenWKR90yebSZyZtJkCGbxi3lStowT32fRt8hEfE&s=Qo26oxiHUS6bEX8ogW7m8YC9B6KEpzfx06lA7_CzRI8&e= >
Sanjay Patel via llvm-dev
2015-Sep-01 15:16 UTC
[llvm-dev] MCRegisterClass mandatory vs preferred alignment?
The performance of unaligned accesses came up here: http://reviews.llvm.org/D12154 Summary: while larger unaligned accesses reduce code size and can improve performance, they also increase the risk of crossing a cacheline and suffering a performance hit that will vary depending on uarch. Crossing cachelines isn't something we account for in general, so we do generate unaligned SSE/AVX accesses for all recent x86, and we generate smaller (4/8 byte) unaligned accesses for all x86. (The criteria for generating unaligned accesses isn't entirely clear/consistent, so you'll find some 'FIXME' comments that I added recently.) So yes, I agree that it's worth experimenting wrt unaligned stack accesses. On Mon, Aug 31, 2015 at 5:25 PM, Matthias Braun via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Would certainly be interesting to perform some benchmarking > (llvm-testsuite/spec) to confirm this. I could imagine that a smaller stack > footprint improves performance (or at least does not degrade it). > > - Matthias > > > On Aug 31, 2015, at 4:15 PM, Philip Reames <listmail at philipreames.com> > wrote: > > > > > > > > On 08/31/2015 03:59 PM, Matthias Braun wrote: > >> Looks to me like the alignment is specified in tablegen. From Target.td: > >> > >> class RegisterClass<string namespace, list<ValueType> regTypes, int > alignment, > >> dag regList, RegAltNameIndex idx = NoRegAltName> > >> > >> X86RegisterInfo.td: > >> > >> def VR256 : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, > v4f64], > >> 256, (sequence "YMM%u", 0, 15)>; > >> def VR256X : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, > v4f64], > >> 256, (sequence "YMM%u", 0, 31)>; > >> > >> Seems to be 256bits/32bytes. > > Yeah, don't know how I missed that. :) > >> > >> I don't know why the alignment was specified the way it is. My guess > would be because memory accesses are faster that way (because they do not > cross cache lines for example). > > This is certainly true on older cores, but is actually true on newer > ones? Looking through Agner's instruction tables, it looks like the > aligned and unaligned versions are essentially the same on newer intels and > amds. > > > > I was originally imagining that I'd need a custom hook or flag, but > would it make sense to just use the unaligned versions if the appropriate > feature flag (IsUAMem32Slow) is unset? This would result in slightly > smaller code on newer architectures without (seemingly, I have no direct > experience here) a performance hit. > >> > >> - Matthias > >> > >>> On Aug 31, 2015, at 3:21 PM, Philip Reames via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >>> > >>> Looking around today, it appears that TargetRegisterClass and > MCRegisterClass only includes a single alignment. This is documented as > being the minimum legal alignment, but it appears to often be greater than > this in practice. For instance, on x86 the alignment of %ymm0 is listed as > 32, not 1. Does anyone know why this is? > >>> > >>> Additionally, where are these alignments actually defined? I don't > seem them appearing in the X86RegisterInfo.td files as I would naively > expect. > >>> > >>> The background for my question is that I'm looking into adding a > function attribute which uses unaligned loads and stores for register > spilling on x86 to avoid the need for dynamic frame realignment. (see the > previous thread "Aligned vector spills and variably sized stack frames") > The key difference w.r.t. to the existing "no-realign-stack" attribute is > that situations which *require* a stack realignment will generate a > fatal_error rather than silently miscompiling. The current mechanism works > by essentially ignoring the alignment criteria and just hoping everything > works out in practice. > >>> > >>> Philip > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> llvm-dev at lists.llvm.org > >>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQIC-g&c=eEvniauFctOgLOKGJOplqw&r=owCLIXjMdMpT1E9Ei7smWg&m=4X-tenWKR90yebSZyZtJkCGbxi3lStowT32fRt8hEfE&s=Qo26oxiHUS6bEX8ogW7m8YC9B6KEpzfx06lA7_CzRI8&e> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150901/e6a9185c/attachment.html>