Philip Reames via llvm-dev
2015-Aug-28 23:00 UTC
[llvm-dev] Aligned vector spills and variably sized stack frames
I've run into a problem that I'm trying to figure out how to address and would welcome ideas and feedback. Today, the vectorizer will nicely vectorize loops using the widest legal vector type for the target. On a reasonable recent machine, this will often end up using AVX2 registers which are 32 bytes wide. If during register allocation, we decide to spill one of these registers, we use the vmovaps instruction which requires the address in memory accessed to be 32 byte aligned. So far, so good. However, the C ABI generally only provides 16 bytes of alignment for the stack on entry to the function. To work around this, the backend will create a variable sized frame with a dynamic amount of padding inserted if required to ensure that a 32 byte aligned spill slot is available. The problem I have is that my runtime's ABI really doesn't like variably sized frames. In particular, the assumption that stack frames are fixed size - except during prolog and epilogue - is fairly baked in. I'm weighing a couple of options for addressing this and want to gather feedback on the perceived difficulty of each. If someone has another approach, I'm also very open to that. Option 1 - Fix my runtime to not expect mostly fixed size frames. This isn't a small change to make, but given it's a strictly internal ABI, I can probably get away with doing it. Given things like shrink-wrapping are coming down the pipe, it might also have secondary benefits. However, this is a relatively risky change to make for a fairly corner case. Option 1a - I could change my ABI to use a 32 byte aligned frame. This has many of the same problems as (1). Option 2 - Don't compile things which need to spill vector registers. This is actually what we do today and has worked out fairly well in practice. This is what I'm hoping to move away from. Option 3 - Add an option in the x86 backend to not require aligned spill slots for AVX2 registers. In particular, the VMOVUPS instruction can be used to spill vector registers into an 8 or 16 byte aligned spill slot and not require dynamic frame realignment. This seems like it might be useful in other context as well, but I can't name any at the moment. One thing that occurs to me is that many spills are down rare paths. Maybe it would make sense to only do dynamic alignment for hot spill/reloads? We could then simply override the heustic to always use unaligned spills. I don't really have a sense for how hard (3) would be to implement. Anyone have an intuition? Philip
Philip Reames via llvm-dev
2015-Aug-28 23:21 UTC
[llvm-dev] Aligned vector spills and variably sized stack frames
On 08/28/2015 04:00 PM, Philip Reames via llvm-dev wrote:> I've run into a problem that I'm trying to figure out how to address > and would welcome ideas and feedback. > > Today, the vectorizer will nicely vectorize loops using the widest > legal vector type for the target. On a reasonable recent machine, > this will often end up using AVX2 registers which are 32 bytes wide. > > If during register allocation, we decide to spill one of these > registers, we use the vmovaps instruction which requires the address > in memory accessed to be 32 byte aligned. So far, so good. > > However, the C ABI generally only provides 16 bytes of alignment for > the stack on entry to the function. To work around this, the backend > will create a variable sized frame with a dynamic amount of padding > inserted if required to ensure that a 32 byte aligned spill slot is > available. > > The problem I have is that my runtime's ABI really doesn't like > variably sized frames. In particular, the assumption that stack > frames are fixed size - except during prolog and epilogue - is fairly > baked in. > > I'm weighing a couple of options for addressing this and want to > gather feedback on the perceived difficulty of each. If someone has > another approach, I'm also very open to that. > > Option 1 - Fix my runtime to not expect mostly fixed size frames. This > isn't a small change to make, but given it's a strictly internal ABI, > I can probably get away with doing it. Given things like > shrink-wrapping are coming down the pipe, it might also have secondary > benefits. However, this is a relatively risky change to make for a > fairly corner case. > > Option 1a - I could change my ABI to use a 32 byte aligned frame. This > has many of the same problems as (1). > > Option 2 - Don't compile things which need to spill vector registers. > This is actually what we do today and has worked out fairly well in > practice. This is what I'm hoping to move away from. > > Option 3 - Add an option in the x86 backend to not require aligned > spill slots for AVX2 registers. In particular, the VMOVUPS > instruction can be used to spill vector registers into an 8 or 16 byte > aligned spill slot and not require dynamic frame realignment. This > seems like it might be useful in other context as well, but I can't > name any at the moment. > > One thing that occurs to me is that many spills are down rare paths. > Maybe it would make sense to only do dynamic alignment for hot > spill/reloads? We could then simply override the heustic to always > use unaligned spills. > > I don't really have a sense for how hard (3) would be to implement. > Anyone have an intuition?After sending this, I did another search and promptly discovered the existing "no-realign-stack" function attribute which seems to do exactly what I need. Anyone know if this is robust?> > Philip > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hal Finkel via llvm-dev
2015-Aug-28 23:23 UTC
[llvm-dev] Aligned vector spills and variably sized stack frames
----- Original Message -----> From: "Philip Reames via llvm-dev" <llvm-dev at lists.llvm.org> > To: "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Friday, August 28, 2015 6:00:50 PM > Subject: [llvm-dev] Aligned vector spills and variably sized stack frames > > I've run into a problem that I'm trying to figure out how to address > and > would welcome ideas and feedback. > > Today, the vectorizer will nicely vectorize loops using the widest > legal > vector type for the target. On a reasonable recent machine, this > will > often end up using AVX2 registers which are 32 bytes wide. > > If during register allocation, we decide to spill one of these > registers, we use the vmovaps instruction which requires the address > in > memory accessed to be 32 byte aligned. So far, so good. > > However, the C ABI generally only provides 16 bytes of alignment for > the > stack on entry to the function. To work around this, the backend > will > create a variable sized frame with a dynamic amount of padding > inserted > if required to ensure that a 32 byte aligned spill slot is available. > > The problem I have is that my runtime's ABI really doesn't like > variably > sized frames. In particular, the assumption that stack frames are > fixed > size - except during prolog and epilogue - is fairly baked in. > > I'm weighing a couple of options for addressing this and want to > gather > feedback on the perceived difficulty of each. If someone has another > approach, I'm also very open to that. > > Option 1 - Fix my runtime to not expect mostly fixed size frames. > This > isn't a small change to make, but given it's a strictly internal ABI, > I > can probably get away with doing it. Given things like > shrink-wrapping > are coming down the pipe, it might also have secondary benefits. > However, this is a relatively risky change to make for a fairly > corner case. > > Option 1a - I could change my ABI to use a 32 byte aligned frame. > This > has many of the same problems as (1). > > Option 2 - Don't compile things which need to spill vector registers. > This is actually what we do today and has worked out fairly well in > practice. This is what I'm hoping to move away from. > > Option 3 - Add an option in the x86 backend to not require aligned > spill > slots for AVX2 registers. In particular, the VMOVUPS instruction can > be > used to spill vector registers into an 8 or 16 byte aligned spill > slot > and not require dynamic frame realignment. This seems like it might > be > useful in other context as well, but I can't name any at the moment. > > One thing that occurs to me is that many spills are down rare paths. > Maybe it would make sense to only do dynamic alignment for hot > spill/reloads? We could then simply override the heustic to always > use > unaligned spills. > > I don't really have a sense for how hard (3) would be to implement. > Anyone have an intuition?I suspect that implementing this would not be too difficult. There are essentially two things that need to be changed: 1. Change the code in X86InstrInfo::storeRegToStackSlot / X86InstrInfo::loadRegFromStackSlot to do the right thing for underaligned stack slots (or, in general, under the control of some target feature, option, etc.) [specifically, you need to change the code in those functions to pass false to the isStackAligned parameter of getStoreRegOpcode and getLoadRegOpcode]. 2. The alignment necessary for register spills is generically specified in the target's *RegisterInfo.td file.(it's the third parameter of the RegisterClass TableGen type). You'd need to specify a way to override that based on some target feature, option, etc. if one does not already exist. -Hal> > Philip > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Hal Finkel via llvm-dev
2015-Aug-28 23:29 UTC
[llvm-dev] Aligned vector spills and variably sized stack frames
----- Original Message -----> From: "Philip Reames via llvm-dev" <llvm-dev at lists.llvm.org> > To: "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Friday, August 28, 2015 6:21:00 PM > Subject: Re: [llvm-dev] Aligned vector spills and variably sized stack frames > > On 08/28/2015 04:00 PM, Philip Reames via llvm-dev wrote: > > I've run into a problem that I'm trying to figure out how to > > address > > and would welcome ideas and feedback. > > > > Today, the vectorizer will nicely vectorize loops using the widest > > legal vector type for the target. On a reasonable recent machine, > > this will often end up using AVX2 registers which are 32 bytes > > wide. > > > > If during register allocation, we decide to spill one of these > > registers, we use the vmovaps instruction which requires the > > address > > in memory accessed to be 32 byte aligned. So far, so good. > > > > However, the C ABI generally only provides 16 bytes of alignment > > for > > the stack on entry to the function. To work around this, the > > backend > > will create a variable sized frame with a dynamic amount of padding > > inserted if required to ensure that a 32 byte aligned spill slot is > > available. > > > > The problem I have is that my runtime's ABI really doesn't like > > variably sized frames. In particular, the assumption that stack > > frames are fixed size - except during prolog and epilogue - is > > fairly > > baked in. > > > > I'm weighing a couple of options for addressing this and want to > > gather feedback on the perceived difficulty of each. If someone > > has > > another approach, I'm also very open to that. > > > > Option 1 - Fix my runtime to not expect mostly fixed size frames. > > This > > isn't a small change to make, but given it's a strictly internal > > ABI, > > I can probably get away with doing it. Given things like > > shrink-wrapping are coming down the pipe, it might also have > > secondary > > benefits. However, this is a relatively risky change to make for a > > fairly corner case. > > > > Option 1a - I could change my ABI to use a 32 byte aligned frame. > > This > > has many of the same problems as (1). > > > > Option 2 - Don't compile things which need to spill vector > > registers. > > This is actually what we do today and has worked out fairly well in > > practice. This is what I'm hoping to move away from. > > > > Option 3 - Add an option in the x86 backend to not require aligned > > spill slots for AVX2 registers. In particular, the VMOVUPS > > instruction can be used to spill vector registers into an 8 or 16 > > byte > > aligned spill slot and not require dynamic frame realignment. This > > seems like it might be useful in other context as well, but I can't > > name any at the moment. > > > > One thing that occurs to me is that many spills are down rare > > paths. > > Maybe it would make sense to only do dynamic alignment for hot > > spill/reloads? We could then simply override the heustic to always > > use unaligned spills. > > > > I don't really have a sense for how hard (3) would be to implement. > > Anyone have an intuition? > After sending this, I did another search and promptly discovered the > existing "no-realign-stack" function attribute which seems to do > exactly > what I need. Anyone know if this is robust?I believe this works correctly, but is not a targeted fix for the AVX spilling problem. ;) -- and I can certainly imagine such a feature being generally desirable. Specifically, all overaligned locals will simply fail to be overaligned (and, thus, the resulting code will likely be broken). In your case, I can imagine you can simply promise never to create such things, and you'll be fine. -Hal> > > > Philip > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Joseph Tremoulet via llvm-dev
2015-Aug-31 01:09 UTC
[llvm-dev] Aligned vector spills and variably sized stack frames
> If someone has another approach, I'm also very open to that.I recently saw another compiler use a "cute trick" for this sort of thing: allocate a fixed amount of space on the stack by including the worst-case padding, and then dynamically set the frame pointer to an aligned location within that. I wouldn't go so far as to say that's a *better* approach (it seems gimmicky/fishy and could open other problems by surprising your runtime/tools in other ways), but it's certainly *another* one :). The only actual benefit that comes to mind is that it would cover other sources of dynamic alignment than RA spill slots, if that's something you need to worry about. -Joseph -----Original Message----- From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Philip Reames via llvm-dev Sent: Friday, August 28, 2015 7:01 PM To: llvm-dev <llvm-dev at lists.llvm.org> Subject: [llvm-dev] Aligned vector spills and variably sized stack frames I've run into a problem that I'm trying to figure out how to address and would welcome ideas and feedback. Today, the vectorizer will nicely vectorize loops using the widest legal vector type for the target. On a reasonable recent machine, this will often end up using AVX2 registers which are 32 bytes wide. If during register allocation, we decide to spill one of these registers, we use the vmovaps instruction which requires the address in memory accessed to be 32 byte aligned. So far, so good. However, the C ABI generally only provides 16 bytes of alignment for the stack on entry to the function. To work around this, the backend will create a variable sized frame with a dynamic amount of padding inserted if required to ensure that a 32 byte aligned spill slot is available. The problem I have is that my runtime's ABI really doesn't like variably sized frames. In particular, the assumption that stack frames are fixed size - except during prolog and epilogue - is fairly baked in. I'm weighing a couple of options for addressing this and want to gather feedback on the perceived difficulty of each. If someone has another approach, I'm also very open to that. Option 1 - Fix my runtime to not expect mostly fixed size frames. This isn't a small change to make, but given it's a strictly internal ABI, I can probably get away with doing it. Given things like shrink-wrapping are coming down the pipe, it might also have secondary benefits. However, this is a relatively risky change to make for a fairly corner case. Option 1a - I could change my ABI to use a 32 byte aligned frame. This has many of the same problems as (1). Option 2 - Don't compile things which need to spill vector registers. This is actually what we do today and has worked out fairly well in practice. This is what I'm hoping to move away from. Option 3 - Add an option in the x86 backend to not require aligned spill slots for AVX2 registers. In particular, the VMOVUPS instruction can be used to spill vector registers into an 8 or 16 byte aligned spill slot and not require dynamic frame realignment. This seems like it might be useful in other context as well, but I can't name any at the moment. One thing that occurs to me is that many spills are down rare paths. Maybe it would make sense to only do dynamic alignment for hot spill/reloads? We could then simply override the heustic to always use unaligned spills. I don't really have a sense for how hard (3) would be to implement. Anyone have an intuition? Philip _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev%0a&data=01%7c01%7cjotrem%40microsoft.com%7c5c07bca4be5a4ca6ff2408d2affc8fd1%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Io5a9rqWfU8nBHK4YpVnPNRiRwDUeJXqmiVL0Vy2LT4%3d
Herbie Robinson via llvm-dev
2015-Aug-31 03:39 UTC
[llvm-dev] Aligned vector spills and variably sized stack frames
If one uses that trick, one should combine all the items needing the large alignment into one allocation. Otherwise, one will be allocating extra space all over the place along with needing a pointer variable for every aligned object. On 8/30/15 9:09 PM, Joseph Tremoulet via llvm-dev wrote:>> If someone has another approach, I'm also very open to that. > I recently saw another compiler use a "cute trick" for this sort of thing: allocate a fixed amount of space on the stack by including the worst-case padding, and then dynamically set the frame pointer to an aligned location within that. I wouldn't go so far as to say that's a *better* approach (it seems gimmicky/fishy and could open other problems by surprising your runtime/tools in other ways), but it's certainly *another* one :). The only actual benefit that comes to mind is that it would cover other sources of dynamic alignment than RA spill slots, if that's something you need to worry about. > > -Joseph > > > -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Philip Reames via llvm-dev > Sent: Friday, August 28, 2015 7:01 PM > To: llvm-dev <llvm-dev at lists.llvm.org> > Subject: [llvm-dev] Aligned vector spills and variably sized stack frames > > I've run into a problem that I'm trying to figure out how to address and would welcome ideas and feedback. > > Today, the vectorizer will nicely vectorize loops using the widest legal vector type for the target. On a reasonable recent machine, this will often end up using AVX2 registers which are 32 bytes wide. > > If during register allocation, we decide to spill one of these registers, we use the vmovaps instruction which requires the address in memory accessed to be 32 byte aligned. So far, so good. > > However, the C ABI generally only provides 16 bytes of alignment for the stack on entry to the function. To work around this, the backend will create a variable sized frame with a dynamic amount of padding inserted if required to ensure that a 32 byte aligned spill slot is available. > > The problem I have is that my runtime's ABI really doesn't like variably sized frames. In particular, the assumption that stack frames are fixed size - except during prolog and epilogue - is fairly baked in. > > I'm weighing a couple of options for addressing this and want to gather feedback on the perceived difficulty of each. If someone has another approach, I'm also very open to that. > > Option 1 - Fix my runtime to not expect mostly fixed size frames. This isn't a small change to make, but given it's a strictly internal ABI, I can probably get away with doing it. Given things like shrink-wrapping are coming down the pipe, it might also have secondary benefits. > However, this is a relatively risky change to make for a fairly corner case. > > Option 1a - I could change my ABI to use a 32 byte aligned frame. This has many of the same problems as (1). > > Option 2 - Don't compile things which need to spill vector registers. > This is actually what we do today and has worked out fairly well in practice. This is what I'm hoping to move away from. > > Option 3 - Add an option in the x86 backend to not require aligned spill slots for AVX2 registers. In particular, the VMOVUPS instruction can be used to spill vector registers into an 8 or 16 byte aligned spill slot and not require dynamic frame realignment. This seems like it might be useful in other context as well, but I can't name any at the moment. > > One thing that occurs to me is that many spills are down rare paths. > Maybe it would make sense to only do dynamic alignment for hot spill/reloads? We could then simply override the heustic to always use unaligned spills. > > I don't really have a sense for how hard (3) would be to implement. > Anyone have an intuition? > > Philip > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev%0a&data=01%7c01%7cjotrem%40microsoft.com%7c5c07bca4be5a4ca6ff2408d2affc8fd1%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Io5a9rqWfU8nBHK4YpVnPNRiRwDUeJXqmiVL0Vy2LT4%3d > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reid Kleckner via llvm-dev
2015-Aug-31 16:36 UTC
[llvm-dev] Aligned vector spills and variably sized stack frames
On Sun, Aug 30, 2015 at 6:09 PM, Joseph Tremoulet via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > If someone has another approach, I'm also very open to that. > > I recently saw another compiler use a "cute trick" for this sort of thing: > allocate a fixed amount of space on the stack by including the worst-case > padding, and then dynamically set the frame pointer to an aligned location > within that. I wouldn't go so far as to say that's a *better* approach (it > seems gimmicky/fishy and could open other problems by surprising your > runtime/tools in other ways), but it's certainly *another* one :). The > only actual benefit that comes to mind is that it would cover other sources > of dynamic alignment than RA spill slots, if that's something you need to > worry about. >I came here to suggest this approach also. :) Right now the X86 backend is using a stack realignment prologue that is designed to fixup the incoming and outgoing alignment to some number. We only need this prologue when the user is telling us that the incoming alignment is too low, and it must be fixed up (i.e. -mstackrealign). -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150831/5cc81ce8/attachment-0001.html>