thr3ads.net - llvm dev - [llvm-dev] MCRegisterClass mandatory vs preferred alignment? [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Philip Reames via llvm-dev

2015-Aug-31 23:15 UTC

[llvm-dev] MCRegisterClass mandatory vs preferred alignment?

On 08/31/2015 03:59 PM, Matthias Braun wrote:> Looks to me like the alignment is specified in tablegen. From Target.td:
>
> class RegisterClass<string namespace, list<ValueType> regTypes,
int alignment,
>                      dag regList, RegAltNameIndex idx = NoRegAltName>
>
> X86RegisterInfo.td:
>
> def VR256 : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64,
v8f32, v4f64],
>                            256, (sequence "YMM%u", 0, 15)>;
> def VR256X : RegisterClass<"X86", [v32i8, v16i16, v8i32,
v4i64, v8f32, v4f64],
>                            256, (sequence "YMM%u", 0, 31)>;
>
> Seems to be 256bits/32bytes.
Yeah, don't know how I missed that.  :)>
> I don't know why the alignment was specified the way it is. My guess
would be because memory accesses are faster that way (because they do not cross
cache lines for example).This is certainly true on older cores, but is actually true on newer 
ones?  Looking through Agner's instruction tables, it looks like the 
aligned and unaligned versions are essentially the same on newer intels 
and amds.

I was originally imagining that I'd need a custom hook or flag, but 
would it make sense to just use the unaligned versions if the 
appropriate feature flag (IsUAMem32Slow) is unset?  This would result in 
slightly smaller code on newer architectures without (seemingly, I have 
no direct experience here) a performance hit.>
> - Matthias
>
>> On Aug 31, 2015, at 3:21 PM, Philip Reames via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>> Looking around today, it appears that TargetRegisterClass and
MCRegisterClass only includes a single alignment.  This is documented as being
the minimum legal alignment, but it appears to often be greater than this in
practice.  For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. 
Does anyone know why this is?
>>
>> Additionally, where are these alignments actually defined?  I don't
seem them appearing in the X86RegisterInfo.td files as I would naively expect.
>>
>> The background for my question is that I'm looking into adding a
function attribute which uses unaligned loads and stores for register spilling
on x86 to avoid the need for dynamic frame realignment.  (see the previous
thread "Aligned vector spills and variably sized stack frames")  The
key difference w.r.t. to the existing "no-realign-stack" attribute is
that situations which *require* a stack realignment will generate a fatal_error
rather than silently miscompiling.  The current mechanism works by essentially
ignoring the alignment criteria and just hoping everything works out in
practice.
>>
>> Philip
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Matthias Braun via llvm-dev

2015-Aug-31 23:25 UTC

head link

[llvm-dev] MCRegisterClass mandatory vs preferred alignment?

Would certainly be interesting to perform some benchmarking
(llvm-testsuite/spec) to confirm this. I could imagine that a smaller stack
footprint improves performance (or at least does not degrade it).

- Matthias
> On Aug 31, 2015, at 4:15 PM, Philip Reames <listmail at
philipreames.com> wrote:
> 
> 
> 
> On 08/31/2015 03:59 PM, Matthias Braun wrote:
>> Looks to me like the alignment is specified in tablegen. From
Target.td:
>> 
>> class RegisterClass<string namespace, list<ValueType>
regTypes, int alignment,
>>                     dag regList, RegAltNameIndex idx = NoRegAltName>
>> 
>> X86RegisterInfo.td:
>> 
>> def VR256 : RegisterClass<"X86", [v32i8, v16i16, v8i32,
v4i64, v8f32, v4f64],
>>                           256, (sequence "YMM%u", 0, 15)>;
>> def VR256X : RegisterClass<"X86", [v32i8, v16i16, v8i32,
v4i64, v8f32, v4f64],
>>                           256, (sequence "YMM%u", 0, 31)>;
>> 
>> Seems to be 256bits/32bytes.
> Yeah, don't know how I missed that.  :)
>> 
>> I don't know why the alignment was specified the way it is. My
guess would be because memory accesses are faster that way (because they do not
cross cache lines for example).
> This is certainly true on older cores, but is actually true on newer ones? 
Looking through Agner's instruction tables, it looks like the aligned and
unaligned versions are essentially the same on newer intels and amds.
> 
> I was originally imagining that I'd need a custom hook or flag, but
would it make sense to just use the unaligned versions if the appropriate
feature flag (IsUAMem32Slow) is unset?  This would result in slightly smaller
code on newer architectures without (seemingly, I have no direct experience
here) a performance hit.
>> 
>> - Matthias
>> 
>>> On Aug 31, 2015, at 3:21 PM, Philip Reames via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>> 
>>> Looking around today, it appears that TargetRegisterClass and
MCRegisterClass only includes a single alignment.  This is documented as being
the minimum legal alignment, but it appears to often be greater than this in
practice.  For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. 
Does anyone know why this is?
>>> 
>>> Additionally, where are these alignments actually defined?  I
don't seem them appearing in the X86RegisterInfo.td files as I would naively
expect.
>>> 
>>> The background for my question is that I'm looking into adding
a function attribute which uses unaligned loads and stores for register spilling
on x86 to avoid the need for dynamic frame realignment.  (see the previous
thread "Aligned vector spills and variably sized stack frames")  The
key difference w.r.t. to the existing "no-realign-stack" attribute is
that situations which *require* a stack realignment will generate a fatal_error
rather than silently miscompiling.  The current mechanism works by essentially
ignoring the alignment criteria and just hoping everything works out in
practice.
>>> 
>>> Philip
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>>
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQIC-g&c=eEvniauFctOgLOKGJOplqw&r=owCLIXjMdMpT1E9Ei7smWg&m=4X-tenWKR90yebSZyZtJkCGbxi3lStowT32fRt8hEfE&s=Qo26oxiHUS6bEX8ogW7m8YC9B6KEpzfx06lA7_CzRI8&e=
>

Sanjay Patel via llvm-dev

2015-Sep-01 15:16 UTC

head link

[llvm-dev] MCRegisterClass mandatory vs preferred alignment?

The performance of unaligned accesses came up here:
http://reviews.llvm.org/D12154

Summary: while larger unaligned accesses reduce code size and can improve
performance, they also increase the risk of crossing a cacheline and
suffering a performance hit that will vary depending on uarch. Crossing
cachelines isn't something we account for in general, so we do generate
unaligned SSE/AVX accesses for all recent x86, and we generate smaller (4/8
byte) unaligned accesses for all x86. (The criteria for generating
unaligned accesses isn't entirely clear/consistent, so you'll find some
'FIXME' comments that I added recently.)

So yes, I agree that it's worth experimenting wrt unaligned stack accesses.

On Mon, Aug 31, 2015 at 5:25 PM, Matthias Braun via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Would certainly be interesting to perform some benchmarking
> (llvm-testsuite/spec) to confirm this. I could imagine that a smaller stack
> footprint improves performance (or at least does not degrade it).
>
> - Matthias
>
> > On Aug 31, 2015, at 4:15 PM, Philip Reames <listmail at
philipreames.com>
> wrote:
> >
> >
> >
> > On 08/31/2015 03:59 PM, Matthias Braun wrote:
> >> Looks to me like the alignment is specified in tablegen. From
Target.td:
> >>
> >> class RegisterClass<string namespace, list<ValueType>
regTypes, int
> alignment,
> >>                     dag regList, RegAltNameIndex idx =
NoRegAltName>
> >>
> >> X86RegisterInfo.td:
> >>
> >> def VR256 : RegisterClass<"X86", [v32i8, v16i16,
v8i32, v4i64, v8f32,
> v4f64],
> >>                           256, (sequence "YMM%u", 0,
15)>;
> >> def VR256X : RegisterClass<"X86", [v32i8, v16i16,
v8i32, v4i64, v8f32,
> v4f64],
> >>                           256, (sequence "YMM%u", 0,
31)>;
> >>
> >> Seems to be 256bits/32bytes.
> > Yeah, don't know how I missed that.  :)
> >>
> >> I don't know why the alignment was specified the way it is. My
guess
> would be because memory accesses are faster that way (because they do not
> cross cache lines for example).
> > This is certainly true on older cores, but is actually true on newer
> ones?  Looking through Agner's instruction tables, it looks like the
> aligned and unaligned versions are essentially the same on newer intels and
> amds.
> >
> > I was originally imagining that I'd need a custom hook or flag,
but
> would it make sense to just use the unaligned versions if the appropriate
> feature flag (IsUAMem32Slow) is unset?  This would result in slightly
> smaller code on newer architectures without (seemingly, I have no direct
> experience here) a performance hit.
> >>
> >> - Matthias
> >>
> >>> On Aug 31, 2015, at 3:21 PM, Philip Reames via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>>
> >>> Looking around today, it appears that TargetRegisterClass and
> MCRegisterClass only includes a single alignment.  This is documented as
> being the minimum legal alignment, but it appears to often be greater than
> this in practice.  For instance, on x86 the alignment of %ymm0 is listed as
> 32, not 1.  Does anyone know why this is?
> >>>
> >>> Additionally, where are these alignments actually defined?  I
don't
> seem them appearing in the X86RegisterInfo.td files as I would naively
> expect.
> >>>
> >>> The background for my question is that I'm looking into
adding a
> function attribute which uses unaligned loads and stores for register
> spilling on x86 to avoid the need for dynamic frame realignment.  (see the
> previous thread "Aligned vector spills and variably sized stack
frames")
> The key difference w.r.t. to the existing "no-realign-stack"
attribute is
> that situations which *require* a stack realignment will generate a
> fatal_error rather than silently miscompiling.  The current mechanism works
> by essentially ignoring the alignment criteria and just hoping everything
> works out in practice.
> >>>
> >>> Philip
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>>
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQIC-g&c=eEvniauFctOgLOKGJOplqw&r=owCLIXjMdMpT1E9Ei7smWg&m=4X-tenWKR90yebSZyZtJkCGbxi3lStowT32fRt8hEfE&s=Qo26oxiHUS6bEX8ogW7m8YC9B6KEpzfx06lA7_CzRI8&e>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150901/e6a9185c/attachment.html>

llvm dev - Aug 2015 - MCRegisterClass mandatory vs preferred alignment?

[llvm-dev] MCRegisterClass mandatory vs preferred alignment?

[llvm-dev] MCRegisterClass mandatory vs preferred alignment?

[llvm-dev] MCRegisterClass mandatory vs preferred alignment?