thr3ads.net - llvm dev - [llvm-dev] Aligned vector spills and variably sized stack frames [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2015-Aug-28 23:29 UTC

[llvm-dev] Aligned vector spills and variably sized stack frames

----- Original Message -----> From: "Philip Reames via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Friday, August 28, 2015 6:21:00 PM
> Subject: Re: [llvm-dev] Aligned vector spills and variably sized stack
frames
> 
> On 08/28/2015 04:00 PM, Philip Reames via llvm-dev wrote:
> > I've run into a problem that I'm trying to figure out how to
> > address
> > and would welcome ideas and feedback.
> >
> > Today, the vectorizer will nicely vectorize loops using the widest
> > legal vector type for the target.  On a reasonable recent machine,
> > this will often end up using AVX2 registers which are 32 bytes
> > wide.
> >
> > If during register allocation, we decide to spill one of these
> > registers, we use the vmovaps instruction which requires the
> > address
> > in memory accessed to be 32 byte aligned.  So far, so good.
> >
> > However, the C ABI generally only provides 16 bytes of alignment
> > for
> > the stack on entry to the function.  To work around this, the
> > backend
> > will create a variable sized frame with a dynamic amount of padding
> > inserted if required to ensure that a 32 byte aligned spill slot is
> > available.
> >
> > The problem I have is that my runtime's ABI really doesn't
like
> > variably sized frames.  In particular, the assumption that stack
> > frames are fixed size - except during prolog and epilogue - is
> > fairly
> > baked in.
> >
> > I'm weighing a couple of options for addressing this and want to
> > gather feedback on the perceived difficulty of each.  If someone
> > has
> > another approach, I'm also very open to that.
> >
> > Option 1 - Fix my runtime to not expect mostly fixed size frames.
> > This
> > isn't a small change to make, but given it's a strictly
internal
> > ABI,
> > I can probably get away with doing it.  Given things like
> > shrink-wrapping are coming down the pipe, it might also have
> > secondary
> > benefits.  However, this is a relatively risky change to make for a
> > fairly corner case.
> >
> > Option 1a - I could change my ABI to use a 32 byte aligned frame.
> > This
> > has many of the same problems as (1).
> >
> > Option 2 - Don't compile things which need to spill vector
> > registers.
> > This is actually what we do today and has worked out fairly well in
> > practice.  This is what I'm hoping to move away from.
> >
> > Option 3 - Add an option in the x86 backend to not require aligned
> > spill slots for AVX2 registers.  In particular, the VMOVUPS
> > instruction can be used to spill vector registers into an 8 or 16
> > byte
> > aligned spill slot and not require dynamic frame realignment. This
> > seems like it might be useful in other context as well, but I
can't
> > name any at the moment.
> >
> > One thing that occurs to me is that many spills are down rare
> > paths.
> > Maybe it would make sense to only do dynamic alignment for hot
> > spill/reloads?  We could then simply override the heustic to always
> > use unaligned spills.
> >
> > I don't really have a sense for how hard (3) would be to
implement.
> > Anyone have an intuition?
> After sending this, I did another search and promptly discovered the
> existing "no-realign-stack" function attribute which seems to do
> exactly
> what I need.  Anyone know if this is robust?
I believe this works correctly, but is not a targeted fix for the AVX spilling
problem. ;) -- and I can certainly imagine such a feature being generally
desirable. Specifically, all overaligned locals will simply fail to be
overaligned (and, thus, the resulting code will likely be broken). In your case,
I can imagine you can simply promise never to create such things, and you'll
be fine.

 -Hal
> >
> > Philip
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Philip Reames via llvm-dev

2015-Aug-29 00:03 UTC

head link

[llvm-dev] Aligned vector spills and variably sized stack frames

On 08/28/2015 04:29 PM, Hal Finkel wrote:> ----- Original Message -----
>> From: "Philip Reames via llvm-dev" <llvm-dev at
lists.llvm.org>
>> To: "llvm-dev" <llvm-dev at lists.llvm.org>
>> Sent: Friday, August 28, 2015 6:21:00 PM
>> Subject: Re: [llvm-dev] Aligned vector spills and variably sized stack
frames
>>
>> On 08/28/2015 04:00 PM, Philip Reames via llvm-dev wrote:
>>> I've run into a problem that I'm trying to figure out how
to
>>> address
>>> and would welcome ideas and feedback.
>>>
>>> Today, the vectorizer will nicely vectorize loops using the widest
>>> legal vector type for the target.  On a reasonable recent machine,
>>> this will often end up using AVX2 registers which are 32 bytes
>>> wide.
>>>
>>> If during register allocation, we decide to spill one of these
>>> registers, we use the vmovaps instruction which requires the
>>> address
>>> in memory accessed to be 32 byte aligned.  So far, so good.
>>>
>>> However, the C ABI generally only provides 16 bytes of alignment
>>> for
>>> the stack on entry to the function.  To work around this, the
>>> backend
>>> will create a variable sized frame with a dynamic amount of padding
>>> inserted if required to ensure that a 32 byte aligned spill slot is
>>> available.
>>>
>>> The problem I have is that my runtime's ABI really doesn't
like
>>> variably sized frames.  In particular, the assumption that stack
>>> frames are fixed size - except during prolog and epilogue - is
>>> fairly
>>> baked in.
>>>
>>> I'm weighing a couple of options for addressing this and want
to
>>> gather feedback on the perceived difficulty of each.  If someone
>>> has
>>> another approach, I'm also very open to that.
>>>
>>> Option 1 - Fix my runtime to not expect mostly fixed size frames.
>>> This
>>> isn't a small change to make, but given it's a strictly
internal
>>> ABI,
>>> I can probably get away with doing it.  Given things like
>>> shrink-wrapping are coming down the pipe, it might also have
>>> secondary
>>> benefits.  However, this is a relatively risky change to make for a
>>> fairly corner case.
>>>
>>> Option 1a - I could change my ABI to use a 32 byte aligned frame.
>>> This
>>> has many of the same problems as (1).
>>>
>>> Option 2 - Don't compile things which need to spill vector
>>> registers.
>>> This is actually what we do today and has worked out fairly well in
>>> practice.  This is what I'm hoping to move away from.
>>>
>>> Option 3 - Add an option in the x86 backend to not require aligned
>>> spill slots for AVX2 registers.  In particular, the VMOVUPS
>>> instruction can be used to spill vector registers into an 8 or 16
>>> byte
>>> aligned spill slot and not require dynamic frame realignment. This
>>> seems like it might be useful in other context as well, but I
can't
>>> name any at the moment.
>>>
>>> One thing that occurs to me is that many spills are down rare
>>> paths.
>>> Maybe it would make sense to only do dynamic alignment for hot
>>> spill/reloads?  We could then simply override the heustic to always
>>> use unaligned spills.
>>>
>>> I don't really have a sense for how hard (3) would be to
implement.
>>> Anyone have an intuition?
>> After sending this, I did another search and promptly discovered the
>> existing "no-realign-stack" function attribute which seems to
do
>> exactly
>> what I need.  Anyone know if this is robust?
> I believe this works correctly, but is not a targeted fix for the AVX
spilling problem. ;) -- and I can certainly imagine such a feature being
generally desirable. Specifically, all overaligned locals will simply fail to be
overaligned (and, thus, the resulting code will likely be broken). In your case,
I can imagine you can simply promise never to create such things, and you'll
be fine.To restate, you're saying that if I had a load or store with alignment 
greater than the native frame size, that using this option might cause 
that alignment not to be respected?  That would work in practice, but I 
should probably solve this in a more principled way to avoid future 
pain.  However, given your comments and the existing attribute, 
implementing something along the lines of my option (3) above shouldn't 
be too hard.  I'll likely post a patch in that direction next week.

Thanks for the guidance.

Philip

Hal Finkel via llvm-dev

2015-Aug-29 00:08 UTC

head link

[llvm-dev] Aligned vector spills and variably sized stack frames

----- Original Message -----> From: "Philip Reames" <listmail at philipreames.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Friday, August 28, 2015 7:03:24 PM
> Subject: Re: [llvm-dev] Aligned vector spills and variably sized stack
frames
> 
> 
> 
> On 08/28/2015 04:29 PM, Hal Finkel wrote:
> > ----- Original Message -----
> >> From: "Philip Reames via llvm-dev" <llvm-dev at
lists.llvm.org>
> >> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> >> Sent: Friday, August 28, 2015 6:21:00 PM
> >> Subject: Re: [llvm-dev] Aligned vector spills and variably sized
> >> stack frames
> >>
> >> On 08/28/2015 04:00 PM, Philip Reames via llvm-dev wrote:
> >>> I've run into a problem that I'm trying to figure out
how to
> >>> address
> >>> and would welcome ideas and feedback.
> >>>
> >>> Today, the vectorizer will nicely vectorize loops using the
> >>> widest
> >>> legal vector type for the target.  On a reasonable recent
> >>> machine,
> >>> this will often end up using AVX2 registers which are 32 bytes
> >>> wide.
> >>>
> >>> If during register allocation, we decide to spill one of these
> >>> registers, we use the vmovaps instruction which requires the
> >>> address
> >>> in memory accessed to be 32 byte aligned.  So far, so good.
> >>>
> >>> However, the C ABI generally only provides 16 bytes of
alignment
> >>> for
> >>> the stack on entry to the function.  To work around this, the
> >>> backend
> >>> will create a variable sized frame with a dynamic amount of
> >>> padding
> >>> inserted if required to ensure that a 32 byte aligned spill
slot
> >>> is
> >>> available.
> >>>
> >>> The problem I have is that my runtime's ABI really
doesn't like
> >>> variably sized frames.  In particular, the assumption that
stack
> >>> frames are fixed size - except during prolog and epilogue - is
> >>> fairly
> >>> baked in.
> >>>
> >>> I'm weighing a couple of options for addressing this and
want to
> >>> gather feedback on the perceived difficulty of each.  If
someone
> >>> has
> >>> another approach, I'm also very open to that.
> >>>
> >>> Option 1 - Fix my runtime to not expect mostly fixed size
frames.
> >>> This
> >>> isn't a small change to make, but given it's a
strictly internal
> >>> ABI,
> >>> I can probably get away with doing it.  Given things like
> >>> shrink-wrapping are coming down the pipe, it might also have
> >>> secondary
> >>> benefits.  However, this is a relatively risky change to make
for
> >>> a
> >>> fairly corner case.
> >>>
> >>> Option 1a - I could change my ABI to use a 32 byte aligned
frame.
> >>> This
> >>> has many of the same problems as (1).
> >>>
> >>> Option 2 - Don't compile things which need to spill vector
> >>> registers.
> >>> This is actually what we do today and has worked out fairly
well
> >>> in
> >>> practice.  This is what I'm hoping to move away from.
> >>>
> >>> Option 3 - Add an option in the x86 backend to not require
> >>> aligned
> >>> spill slots for AVX2 registers.  In particular, the VMOVUPS
> >>> instruction can be used to spill vector registers into an 8 or
16
> >>> byte
> >>> aligned spill slot and not require dynamic frame realignment.
> >>> This
> >>> seems like it might be useful in other context as well, but I
> >>> can't
> >>> name any at the moment.
> >>>
> >>> One thing that occurs to me is that many spills are down rare
> >>> paths.
> >>> Maybe it would make sense to only do dynamic alignment for hot
> >>> spill/reloads?  We could then simply override the heustic to
> >>> always
> >>> use unaligned spills.
> >>>
> >>> I don't really have a sense for how hard (3) would be to
> >>> implement.
> >>> Anyone have an intuition?
> >> After sending this, I did another search and promptly discovered
> >> the
> >> existing "no-realign-stack" function attribute which
seems to do
> >> exactly
> >> what I need.  Anyone know if this is robust?
> > I believe this works correctly, but is not a targeted fix for the
> > AVX spilling problem. ;) -- and I can certainly imagine such a
> > feature being generally desirable. Specifically, all overaligned
> > locals will simply fail to be overaligned (and, thus, the
> > resulting code will likely be broken). In your case, I can imagine
> > you can simply promise never to create such things, and you'll be
> > fine.
> To restate, you're saying that if I had a load or store with
> alignment
> greater than the native frame size, that using this option might
> cause
> that alignment not to be respected? 
No, what I'm saying is that if you were to create an alloca instruction with
an alignment specified to be greater than the ABI stack alignment, and you use
no-realign-stack to disable all stack realignment, then the resulting stack slot
may simply not have the requested alignment.

 -Hal
> That would work in practice, but
> I
> should probably solve this in a more principled way to avoid future
> pain.  However, given your comments and the existing attribute,
> implementing something along the lines of my option (3) above
> shouldn't
> be too hard.  I'll likely post a patch in that direction next week.
> 
> Thanks for the guidance.
> 
> Philip
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

llvm dev - Aug 2015 - Aligned vector spills and variably sized stack frames

[llvm-dev] Aligned vector spills and variably sized stack frames

[llvm-dev] Aligned vector spills and variably sized stack frames

[llvm-dev] Aligned vector spills and variably sized stack frames