thr3ads.net - llvm dev - [llvm-dev] Aligned vector spills and variably sized stack frames [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Philip Reames via llvm-dev

2015-Aug-28 23:00 UTC

[llvm-dev] Aligned vector spills and variably sized stack frames

I've run into a problem that I'm trying to figure out how to address and
would welcome ideas and feedback.

Today, the vectorizer will nicely vectorize loops using the widest legal 
vector type for the target.  On a reasonable recent machine, this will 
often end up using AVX2 registers which are 32 bytes wide.

If during register allocation, we decide to spill one of these 
registers, we use the vmovaps instruction which requires the address in 
memory accessed to be 32 byte aligned.  So far, so good.

However, the C ABI generally only provides 16 bytes of alignment for the 
stack on entry to the function.  To work around this, the backend will 
create a variable sized frame with a dynamic amount of padding inserted 
if required to ensure that a 32 byte aligned spill slot is available.

The problem I have is that my runtime's ABI really doesn't like variably
sized frames.  In particular, the assumption that stack frames are fixed 
size - except during prolog and epilogue - is fairly baked in.

I'm weighing a couple of options for addressing this and want to gather 
feedback on the perceived difficulty of each.  If someone has another 
approach, I'm also very open to that.

Option 1 - Fix my runtime to not expect mostly fixed size frames. This 
isn't a small change to make, but given it's a strictly internal ABI, I 
can probably get away with doing it.  Given things like shrink-wrapping 
are coming down the pipe, it might also have secondary benefits.  
However, this is a relatively risky change to make for a fairly corner case.

Option 1a - I could change my ABI to use a 32 byte aligned frame. This 
has many of the same problems as (1).

Option 2 - Don't compile things which need to spill vector registers.  
This is actually what we do today and has worked out fairly well in 
practice.  This is what I'm hoping to move away from.

Option 3 - Add an option in the x86 backend to not require aligned spill 
slots for AVX2 registers.  In particular, the VMOVUPS instruction can be 
used to spill vector registers into an 8 or 16 byte aligned spill slot 
and not require dynamic frame realignment. This seems like it might be 
useful in other context as well, but I can't name any at the moment.

One thing that occurs to me is that many spills are down rare paths.  
Maybe it would make sense to only do dynamic alignment for hot 
spill/reloads?  We could then simply override the heustic to always use 
unaligned spills.

I don't really have a sense for how hard (3) would be to implement. 
Anyone have an intuition?

Philip

Philip Reames via llvm-dev

2015-Aug-28 23:21 UTC

head link

[llvm-dev] Aligned vector spills and variably sized stack frames

On 08/28/2015 04:00 PM, Philip Reames via llvm-dev
wrote:> I've run into a problem that I'm trying to figure out how to
address
> and would welcome ideas and feedback.
>
> Today, the vectorizer will nicely vectorize loops using the widest 
> legal vector type for the target.  On a reasonable recent machine, 
> this will often end up using AVX2 registers which are 32 bytes wide.
>
> If during register allocation, we decide to spill one of these 
> registers, we use the vmovaps instruction which requires the address 
> in memory accessed to be 32 byte aligned.  So far, so good.
>
> However, the C ABI generally only provides 16 bytes of alignment for 
> the stack on entry to the function.  To work around this, the backend 
> will create a variable sized frame with a dynamic amount of padding 
> inserted if required to ensure that a 32 byte aligned spill slot is 
> available.
>
> The problem I have is that my runtime's ABI really doesn't like 
> variably sized frames.  In particular, the assumption that stack 
> frames are fixed size - except during prolog and epilogue - is fairly 
> baked in.
>
> I'm weighing a couple of options for addressing this and want to 
> gather feedback on the perceived difficulty of each.  If someone has 
> another approach, I'm also very open to that.
>
> Option 1 - Fix my runtime to not expect mostly fixed size frames. This 
> isn't a small change to make, but given it's a strictly internal
ABI,
> I can probably get away with doing it.  Given things like 
> shrink-wrapping are coming down the pipe, it might also have secondary 
> benefits.  However, this is a relatively risky change to make for a 
> fairly corner case.
>
> Option 1a - I could change my ABI to use a 32 byte aligned frame. This 
> has many of the same problems as (1).
>
> Option 2 - Don't compile things which need to spill vector registers.  
> This is actually what we do today and has worked out fairly well in 
> practice.  This is what I'm hoping to move away from.
>
> Option 3 - Add an option in the x86 backend to not require aligned 
> spill slots for AVX2 registers.  In particular, the VMOVUPS 
> instruction can be used to spill vector registers into an 8 or 16 byte 
> aligned spill slot and not require dynamic frame realignment. This 
> seems like it might be useful in other context as well, but I can't 
> name any at the moment.
>
> One thing that occurs to me is that many spills are down rare paths.  
> Maybe it would make sense to only do dynamic alignment for hot 
> spill/reloads?  We could then simply override the heustic to always 
> use unaligned spills.
>
> I don't really have a sense for how hard (3) would be to implement. 
> Anyone have an intuition?After sending this, I did another search and promptly discovered the 
existing "no-realign-stack" function attribute which seems to do
exactly
what I need.  Anyone know if this is robust?>
> Philip
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Hal Finkel via llvm-dev

2015-Aug-28 23:23 UTC

head link

[llvm-dev] Aligned vector spills and variably sized stack frames

----- Original Message -----> From: "Philip Reames via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Friday, August 28, 2015 6:00:50 PM
> Subject: [llvm-dev] Aligned vector spills and variably sized stack frames
> 
> I've run into a problem that I'm trying to figure out how to
address
> and
> would welcome ideas and feedback.
> 
> Today, the vectorizer will nicely vectorize loops using the widest
> legal
> vector type for the target.  On a reasonable recent machine, this
> will
> often end up using AVX2 registers which are 32 bytes wide.
> 
> If during register allocation, we decide to spill one of these
> registers, we use the vmovaps instruction which requires the address
> in
> memory accessed to be 32 byte aligned.  So far, so good.
> 
> However, the C ABI generally only provides 16 bytes of alignment for
> the
> stack on entry to the function.  To work around this, the backend
> will
> create a variable sized frame with a dynamic amount of padding
> inserted
> if required to ensure that a 32 byte aligned spill slot is available.
> 
> The problem I have is that my runtime's ABI really doesn't like
> variably
> sized frames.  In particular, the assumption that stack frames are
> fixed
> size - except during prolog and epilogue - is fairly baked in.
> 
> I'm weighing a couple of options for addressing this and want to
> gather
> feedback on the perceived difficulty of each.  If someone has another
> approach, I'm also very open to that.
> 
> Option 1 - Fix my runtime to not expect mostly fixed size frames.
> This
> isn't a small change to make, but given it's a strictly internal
ABI,
> I
> can probably get away with doing it.  Given things like
> shrink-wrapping
> are coming down the pipe, it might also have secondary benefits.
> However, this is a relatively risky change to make for a fairly
> corner case.
> 
> Option 1a - I could change my ABI to use a 32 byte aligned frame.
> This
> has many of the same problems as (1).
> 
> Option 2 - Don't compile things which need to spill vector registers.
> This is actually what we do today and has worked out fairly well in
> practice.  This is what I'm hoping to move away from.
> 
> Option 3 - Add an option in the x86 backend to not require aligned
> spill
> slots for AVX2 registers.  In particular, the VMOVUPS instruction can
> be
> used to spill vector registers into an 8 or 16 byte aligned spill
> slot
> and not require dynamic frame realignment. This seems like it might
> be
> useful in other context as well, but I can't name any at the moment.
> 
> One thing that occurs to me is that many spills are down rare paths.
> Maybe it would make sense to only do dynamic alignment for hot
> spill/reloads?  We could then simply override the heustic to always
> use
> unaligned spills.
> 
> I don't really have a sense for how hard (3) would be to implement.
> Anyone have an intuition?
I suspect that implementing this would not be too difficult. There are
essentially two things that need to be changed:

 1. Change the code in X86InstrInfo::storeRegToStackSlot /
X86InstrInfo::loadRegFromStackSlot to do the right thing for underaligned stack
slots (or, in general, under the control of some target feature, option, etc.)
[specifically, you need to change the code in those functions to pass false to
the isStackAligned parameter of getStoreRegOpcode and getLoadRegOpcode].

 2. The alignment necessary for register spills is generically specified in the
target's *RegisterInfo.td file.(it's the third parameter of the
RegisterClass TableGen type). You'd need to specify a way to override that
based on some target feature, option, etc. if one does not already exist.

 -Hal
> 
> Philip
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Hal Finkel via llvm-dev

2015-Aug-28 23:29 UTC

head link

[llvm-dev] Aligned vector spills and variably sized stack frames

----- Original Message -----> From: "Philip Reames via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Friday, August 28, 2015 6:21:00 PM
> Subject: Re: [llvm-dev] Aligned vector spills and variably sized stack
frames
> 
> On 08/28/2015 04:00 PM, Philip Reames via llvm-dev wrote:
> > I've run into a problem that I'm trying to figure out how to
> > address
> > and would welcome ideas and feedback.
> >
> > Today, the vectorizer will nicely vectorize loops using the widest
> > legal vector type for the target.  On a reasonable recent machine,
> > this will often end up using AVX2 registers which are 32 bytes
> > wide.
> >
> > If during register allocation, we decide to spill one of these
> > registers, we use the vmovaps instruction which requires the
> > address
> > in memory accessed to be 32 byte aligned.  So far, so good.
> >
> > However, the C ABI generally only provides 16 bytes of alignment
> > for
> > the stack on entry to the function.  To work around this, the
> > backend
> > will create a variable sized frame with a dynamic amount of padding
> > inserted if required to ensure that a 32 byte aligned spill slot is
> > available.
> >
> > The problem I have is that my runtime's ABI really doesn't
like
> > variably sized frames.  In particular, the assumption that stack
> > frames are fixed size - except during prolog and epilogue - is
> > fairly
> > baked in.
> >
> > I'm weighing a couple of options for addressing this and want to
> > gather feedback on the perceived difficulty of each.  If someone
> > has
> > another approach, I'm also very open to that.
> >
> > Option 1 - Fix my runtime to not expect mostly fixed size frames.
> > This
> > isn't a small change to make, but given it's a strictly
internal
> > ABI,
> > I can probably get away with doing it.  Given things like
> > shrink-wrapping are coming down the pipe, it might also have
> > secondary
> > benefits.  However, this is a relatively risky change to make for a
> > fairly corner case.
> >
> > Option 1a - I could change my ABI to use a 32 byte aligned frame.
> > This
> > has many of the same problems as (1).
> >
> > Option 2 - Don't compile things which need to spill vector
> > registers.
> > This is actually what we do today and has worked out fairly well in
> > practice.  This is what I'm hoping to move away from.
> >
> > Option 3 - Add an option in the x86 backend to not require aligned
> > spill slots for AVX2 registers.  In particular, the VMOVUPS
> > instruction can be used to spill vector registers into an 8 or 16
> > byte
> > aligned spill slot and not require dynamic frame realignment. This
> > seems like it might be useful in other context as well, but I
can't
> > name any at the moment.
> >
> > One thing that occurs to me is that many spills are down rare
> > paths.
> > Maybe it would make sense to only do dynamic alignment for hot
> > spill/reloads?  We could then simply override the heustic to always
> > use unaligned spills.
> >
> > I don't really have a sense for how hard (3) would be to
implement.
> > Anyone have an intuition?
> After sending this, I did another search and promptly discovered the
> existing "no-realign-stack" function attribute which seems to do
> exactly
> what I need.  Anyone know if this is robust?
I believe this works correctly, but is not a targeted fix for the AVX spilling
problem. ;) -- and I can certainly imagine such a feature being generally
desirable. Specifically, all overaligned locals will simply fail to be
overaligned (and, thus, the resulting code will likely be broken). In your case,
I can imagine you can simply promise never to create such things, and you'll
be fine.

 -Hal
> >
> > Philip
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Joseph Tremoulet via llvm-dev

2015-Aug-31 01:09 UTC

head link

[llvm-dev] Aligned vector spills and variably sized stack frames

>  If someone has another approach, I'm also very open to that.
I recently saw another compiler use a "cute trick" for this sort of
thing: allocate a fixed amount of space on the stack by including the worst-case
padding, and then dynamically set the frame pointer to an aligned location
within that.  I wouldn't go so far as to say that's a *better* approach
(it seems gimmicky/fishy and could open other problems by surprising your
runtime/tools in other ways), but it's certainly *another* one :).  The only
actual benefit that comes to mind is that it would cover other sources of
dynamic alignment than RA spill slots, if that's something you need to worry
about.

-Joseph


-----Original Message-----
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Philip
Reames via llvm-dev
Sent: Friday, August 28, 2015 7:01 PM
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Aligned vector spills and variably sized stack frames

I've run into a problem that I'm trying to figure out how to address and
would welcome ideas and feedback.

Today, the vectorizer will nicely vectorize loops using the widest legal vector
type for the target.  On a reasonable recent machine, this will often end up
using AVX2 registers which are 32 bytes wide.

If during register allocation, we decide to spill one of these registers, we use
the vmovaps instruction which requires the address in memory accessed to be 32
byte aligned.  So far, so good.

However, the C ABI generally only provides 16 bytes of alignment for the stack
on entry to the function.  To work around this, the backend will create a
variable sized frame with a dynamic amount of padding inserted if required to
ensure that a 32 byte aligned spill slot is available.

The problem I have is that my runtime's ABI really doesn't like variably
sized frames.  In particular, the assumption that stack frames are fixed size -
except during prolog and epilogue - is fairly baked in.

I'm weighing a couple of options for addressing this and want to gather
feedback on the perceived difficulty of each.  If someone has another approach,
I'm also very open to that.

Option 1 - Fix my runtime to not expect mostly fixed size frames. This isn't
a small change to make, but given it's a strictly internal ABI, I can
probably get away with doing it.  Given things like shrink-wrapping are coming
down the pipe, it might also have secondary benefits.
However, this is a relatively risky change to make for a fairly corner case.

Option 1a - I could change my ABI to use a 32 byte aligned frame. This has many
of the same problems as (1).

Option 2 - Don't compile things which need to spill vector registers.  
This is actually what we do today and has worked out fairly well in practice. 
This is what I'm hoping to move away from.

Option 3 - Add an option in the x86 backend to not require aligned spill slots
for AVX2 registers.  In particular, the VMOVUPS instruction can be used to spill
vector registers into an 8 or 16 byte aligned spill slot and not require dynamic
frame realignment. This seems like it might be useful in other context as well,
but I can't name any at the moment.

One thing that occurs to me is that many spills are down rare paths.  
Maybe it would make sense to only do dynamic alignment for hot spill/reloads? 
We could then simply override the heustic to always use unaligned spills.

I don't really have a sense for how hard (3) would be to implement. 
Anyone have an intuition?

Philip

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev%0a&data=01%7c01%7cjotrem%40microsoft.com%7c5c07bca4be5a4ca6ff2408d2affc8fd1%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Io5a9rqWfU8nBHK4YpVnPNRiRwDUeJXqmiVL0Vy2LT4%3d

Herbie Robinson via llvm-dev

2015-Aug-31 03:39 UTC

head link

[llvm-dev] Aligned vector spills and variably sized stack frames

If one uses that trick, one should combine all the items needing the 
large alignment into one allocation.  Otherwise, one will be allocating 
extra space all over the place along with needing a pointer variable for 
every aligned object.

On 8/30/15 9:09 PM, Joseph Tremoulet via llvm-dev wrote:>>   If someone has another approach, I'm also very open to that.
> I recently saw another compiler use a "cute trick" for this sort
of thing: allocate a fixed amount of space on the stack by including the
worst-case padding, and then dynamically set the frame pointer to an aligned
location within that.  I wouldn't go so far as to say that's a *better*
approach (it seems gimmicky/fishy and could open other problems by surprising
your runtime/tools in other ways), but it's certainly *another* one :).  The
only actual benefit that comes to mind is that it would cover other sources of
dynamic alignment than RA spill slots, if that's something you need to worry
about.
>
> -Joseph
>
>
> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Philip Reames via llvm-dev
> Sent: Friday, August 28, 2015 7:01 PM
> To: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [llvm-dev] Aligned vector spills and variably sized stack frames
>
> I've run into a problem that I'm trying to figure out how to
address and would welcome ideas and feedback.
>
> Today, the vectorizer will nicely vectorize loops using the widest legal
vector type for the target.  On a reasonable recent machine, this will often end
up using AVX2 registers which are 32 bytes wide.
>
> If during register allocation, we decide to spill one of these registers,
we use the vmovaps instruction which requires the address in memory accessed to
be 32 byte aligned.  So far, so good.
>
> However, the C ABI generally only provides 16 bytes of alignment for the
stack on entry to the function.  To work around this, the backend will create a
variable sized frame with a dynamic amount of padding inserted if required to
ensure that a 32 byte aligned spill slot is available.
>
> The problem I have is that my runtime's ABI really doesn't like
variably sized frames.  In particular, the assumption that stack frames are
fixed size - except during prolog and epilogue - is fairly baked in.
>
> I'm weighing a couple of options for addressing this and want to gather
feedback on the perceived difficulty of each.  If someone has another approach,
I'm also very open to that.
>
> Option 1 - Fix my runtime to not expect mostly fixed size frames. This
isn't a small change to make, but given it's a strictly internal ABI, I
can probably get away with doing it.  Given things like shrink-wrapping are
coming down the pipe, it might also have secondary benefits.
> However, this is a relatively risky change to make for a fairly corner
case.
>
> Option 1a - I could change my ABI to use a 32 byte aligned frame. This has
many of the same problems as (1).
>
> Option 2 - Don't compile things which need to spill vector registers.
> This is actually what we do today and has worked out fairly well in
practice.  This is what I'm hoping to move away from.
>
> Option 3 - Add an option in the x86 backend to not require aligned spill
slots for AVX2 registers.  In particular, the VMOVUPS instruction can be used to
spill vector registers into an 8 or 16 byte aligned spill slot and not require
dynamic frame realignment. This seems like it might be useful in other context
as well, but I can't name any at the moment.
>
> One thing that occurs to me is that many spills are down rare paths.
> Maybe it would make sense to only do dynamic alignment for hot
spill/reloads?  We could then simply override the heustic to always use
unaligned spills.
>
> I don't really have a sense for how hard (3) would be to implement.
> Anyone have an intuition?
>
> Philip
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
>
https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev%0a&data=01%7c01%7cjotrem%40microsoft.com%7c5c07bca4be5a4ca6ff2408d2affc8fd1%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Io5a9rqWfU8nBHK4YpVnPNRiRwDUeJXqmiVL0Vy2LT4%3d
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reid Kleckner via llvm-dev

2015-Aug-31 16:36 UTC

head link

[llvm-dev] Aligned vector spills and variably sized stack frames

On Sun, Aug 30, 2015 at 6:09 PM, Joseph Tremoulet via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> >  If someone has another approach, I'm also very open to that.
>
> I recently saw another compiler use a "cute trick" for this sort
of thing:
> allocate a fixed amount of space on the stack by including the worst-case
> padding, and then dynamically set the frame pointer to an aligned location
> within that.  I wouldn't go so far as to say that's a *better*
approach (it
> seems gimmicky/fishy and could open other problems by surprising your
> runtime/tools in other ways), but it's certainly *another* one :).  The
> only actual benefit that comes to mind is that it would cover other sources
> of dynamic alignment than RA spill slots, if that's something you need
to
> worry about.
>
I came here to suggest this approach also. :)

Right now the X86 backend is using a stack realignment prologue that is
designed to fixup the incoming and outgoing alignment to some number. We
only need this prologue when the user is telling us that the incoming
alignment is too low, and it must be fixed up (i.e. -mstackrealign).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150831/5cc81ce8/attachment-0001.html>

llvm dev - Aug 2015 - Aligned vector spills and variably sized stack frames

[llvm-dev] Aligned vector spills and variably sized stack frames

[llvm-dev] Aligned vector spills and variably sized stack frames

[llvm-dev] Aligned vector spills and variably sized stack frames

[llvm-dev] Aligned vector spills and variably sized stack frames

[llvm-dev] Aligned vector spills and variably sized stack frames

[llvm-dev] Aligned vector spills and variably sized stack frames

[llvm-dev] Aligned vector spills and variably sized stack frames