Reid Kleckner via llvm-dev
2020-Jan-26 23:12 UTC
[llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated
Hello all, A few years ago, I added the inalloca feature to LLVM IR so that Clang could be C++ ABI compatible with MSVC on 32-bit x86. The feature works, but there is room for improvement. I recently took the time to write up a design using token values that will hopefully be better named and easier to work with and around. For the technical details of the proposal, I've written up the RFC in Markdown here: https://github.com/rnk/llvm-project/blob/call-setup-docs/llvm/docs/CallSetup.md I've pasted the text below if you want to quote and reply on the list. The main question I have for the community is, given that it is infeasible to upgrade inalloca to llvm.call.setup, can we drop support for the old IR? So far as I am aware, Clang has been the only user of this attribute, and only for i*86-windows-msvc targets. The main goal of adding llvm.call.setup is to get rid of inalloca, so if all LLVM IR transforms have to keep knowing about it, there is no reason to add a new way to do the same thing. Thanks, Reid The last RFC: http://lists.llvm.org/pipermail/llvm-dev/2013-July/064218.html Current inalloca docs: https://llvm.org/docs/InAlloca.html ---------------- # RFC: Replace inalloca with llvm.call.setup and preallocated In order to pass non-trivially copyable objects by value in a way that is compatible with the Visual C++ compiler on 32-bit x86, Clang has to be able to separate the allocation of call argument memory from the call site. The `inalloca` feature was added to LLVM for this purpose. However, this feature usually results in inefficient code, and is often incompatible with otherwise straightforward LLVM IR transforms. Therefore, I would like to get rid of it and replace it with `llvm.call.setup`, which I hope will be more maintainable in the long run. These are some of the drawbacks of inalloca that I want to fix with the new IR: - Blocks most interprocedural prototype changing transforms: dead argument elimination, argument promotion, application of fastcc, etc. - Blocks alias analysis: Unrelated args in memory are hard to analyze - Blocks function attribute inference for unrelated parameters - Hard for frontends to use because one argument can affect every other argument Since inalloca was added, the token type was added to LLVM. Transforms are not allowed to obscure the definition of a token value, and tokens can be used to create single-entry, multi-exit regions in LLVM IR. This allows the creation of a call setup operation that must remain paired with its call site throughout mid-level optimization. That guarantee allows the backend to look at the call site when emitting the call setup instructions. If the IR for a call setup forms a proper region, that can also help the backend perform frame pointer elimination in many cases. ## New IR Features Here is the list of new IR features I think this requires: Intrinsics: - `token @llvm.call.setup(i32 %numArgs)` - `i8* @llvm.call.alloc(token %site, i32 %argIdx)` - `void @llvm.call.teardown(token)` Attributes: - `preallocated(<ty>)`, similar to byval, but no copy Bundles: - `[ "callsetup"(token %site) ]` Verifier rules: - `llvm.call.setup` must have exactly one corresponding call site - call site must have equal number of preallocated args to `llvm.call.setup` ## Intended Usage Here is an example of LLVM IR for an unpacked call site: ```llvm %cs = call token @llvm.call.setup(i32 3) %m0 = call i8* @llvm.call.alloc(token %cs, i32 0) ;; allocates {i32} call void @llvm.memset*(i8* %m0, i32 0, i32 4) %m1 = call i8* @llvm.call.alloc(token %cs, i32 1) ;; allocates {i32, i32} call void @llvm.memset*(i8* %m1, i32 0, i32 8) %m2 = call i8* @llvm.call.alloc(token %cs, i32 2) ;; allocates {i32, i32, i32} call void @llvm.memset*(i8* %m2, i32 0, i32 12) call void @use_callsetup( {i32}* preallocated({i32}) %m1, i32 13, {i32,i32}* preallocated({i32,i32}) %m2, i32 42, {i32,i32,i32}* preallocated({i32,i32,i32}) %m3) [ "callsetup"(token %cs) ] ``` Many transforms will need to be made aware of the new verifier guarantees, but they should only block optimizations on preallocated arguments. The goal is that unrelated arguments, such as the i32 arguments above, remain unaffected. DAE, for example, is free to eliminate the plain integers, but it cannot eliminate a preallocated argument without adjusting the call.setup. The next most important thing to keep in mind is how this interacts with exception handling. This is where `llvm.call.teardown` comes into play. The idea is that, in a call region, clang should push an exceptional cleanup onto the cleanup stack. Here is what the IR for the previous example would look like, assuming each argument has a constructor that may throw: ```llvm %cs = call token @llvm.call.setup(i32 3) %m0 = call i8* @llvm.call.alloc(token %cs, i32 0) invoke void @ctor0(i8* %m0) to label %cont0 unwind label %cleanupCall cont0: %m1 = call i8* @llvm.call.alloc(token %cs, i32 1) invoke void @ctor1(i8* %m1) to label %cont1 unwind label %cleanup0 cont1: %m2 = call i8* @llvm.call.alloc(token %cs, i32 2) invoke void @ctor2(i8* %m2) to label %cont2 unwind label %cleanup1 cont2: call void @use_callsetup(i8* preallocated %m1, i32 13, i8* preallocated %m2, i32 42, i8* preallocated %m3) [ "callsetup"(token %cs) ] cleanup1: %cl1 = cleanuppad unwind to caller call void @dtor(i8* %m1) [ "funclet"(token %cl1) ] cleanupret %cl1 to label %cleanup0 cleanup0: %cl0 = cleanuppad unwind to caller call void @dtor(i8% %m0) [ "funclet"(token %cl2) ] cleanupret %cl0 to label %cleanupCall cleanupCall: %clC = cleanuppad unwind to caller call void @llvm.call.teardown(token %cs) ;; Or llvm.call.cleanup? cleanupret %clC to caller ``` Generally, cleanups to tear down a call setup region are not needed if control cannot return to the current function. The cleanupCall block is an example of such an unnecessary cleanup. However, to make things easy for the inliner, the frontend is required to emit these cleanups. Prior to code generation, the WinEHPrepare pass can remove any unneeded argument memory cleanups. ## Canonical Form `llvm.call.setup` is designed for compatibility, not for performance or analyzability. Mid-level optimization passes should generally try to remove these intrinsics and attributes when possible. When it is possible to change the prototype of a function using these attributes, call sites should be canonicalized in the following way: - Create a static alloca of appropriate type for each preallocated argument - Replace all uses of `llvm.call.alloc` with the corresponding new alloca - Insert `lifetime.start` for each static alloca at the original alloc site, and `llvm.lifetime.end` after the call and at `llvm.call.teardown`. - Remove all setup, alloc, teardown intrinsics, and remove the `preallocated` attributes from caller and callee. GlobalOpt seems like a logical place for this transform. This form should enable downstream optimizations, and reduce the number of transforms that need to be taught that `llvm.call.alloc` creates stack memory. ## Inliner Considerations When inlining a call site that uses `llvm.call.setup`, the inliner should also make the transforms described above, only without adjusting the callee's prototype. ## Corner cases ### Catching an exception within a call region If an exception is thrown and caught within the call setup region, the newly established SP must be saved into the EH record when a call is setup. Consider the case below of inlining try / catch into a call region: ```c++ struct Foo { Foo(int x); Foo(const Foo &o); ~Foo(); int x, y, z; }; void use_callsetup(int, Foo, Foo); int maythrow2(); static inline int maythrow() { try { return maythrow2(); } catch (int) {} return 0; } Foo getf(); int main() { use_callsetup(maythrow(), getf(), getf()); } ``` The backend should be able to detect whether SP must be saved or not by checking if there are any invokes reachable along normal paths from the call setup that do not first reach the call itself, or a normal `llvm.call.teardown`. ### Non-exceptional call region exit It is possible, using statement expressions, to exit a call setup region with normal control flow. In this case, the Clang should emit a normal cleanup to call `llvm.eh.teardown`. Consider: ```c++ #define RETURN_IF_ERR(maybeError) \ ({ \ auto val = maybeError; \ if (val.isError()) \ return val.getError(); \ val.getValue; \ }) ErrorOr<int> mayFail(); void use_callsetup(int, Foo); void f(bool cond) { use_callsetup(RETURN_IF_ERR(mayFail()), Foo()); } ``` ### Inlining non-EH code into EH code If exceptions are disabled, the frontend should not emit exceptional cleanups to teardown the call region. However, this code could be inlined into a function that uses exceptions, and the caller could catch an exception thrown through the non-EH code. Generally this should be impossible, because the calls will be marked nounwind, and it is UB to throw an exception through a nounwind call site. However, SEH allows exceptions to be thrown through nounwind call sites. Consider: ```c++ int mayThrow(); void use_callsetup(int, Foo); static inline void inlineMe() { void use_callsetup(mayThrow(), Foo()); } void f(int numRetries) { for (int i = 0; i < numRetries; i++) { __try { inlineMe(); } __except (1) { } } } ``` Unless the compiler inserts stack adjustments along the unwind path, this well setup call argument memory for every loop iteration, and never clean it up. This could result in code generation bugs or excessive stack memory use. The two options are: 1. Refuse to inline functions containing call regions through invoke call sites 1. Teach the inliner to synthesize llvm.call.teardown cleanups inside call regions 1. Do nothing, the frontend already disables inlining into `__try` Given that this corner case seems specific to SEH, the third option seems most reasonable. ## Implementation Steps In the spirit of incremental development, I think the implementation could be broken down into the following patch series: - IR: LLVM IR intrinsics and attributes - Clang: IRGen, under -cc1 flag - X86: backend implementation: inefficient, no EH - Transforms: Update inliner and globalopt, audit other transforms - X86: MSVC C++ EH SP management - ... test it on Chrome with -cc1 flag - Clang: Remove cc1 flag and inalloca IRGen logic - ... announce inalloca removal from LLVM IR - inalloca removal ## Backwards compatibility It may be possible to upgrade some bitcode from `inalloca` to `llvm.call.setup`, but in cases with complex control flow where the allocation site does not dominate the call, it will not be straightforward. Therefore, I think we should make an exception to LLVM's usual policy of bitcode backwards compatibility, and drop support for `inalloca`. The `inalloca` attribute was generally a source of bugs, and was only used for code targetting `i*86-windows-msvc`. To the best of my knowledge, there are no users of LLVM who archive bitcode for that platform and expect to be able to upgrade it for use with future LLVM versions. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200126/7f820c37/attachment.html>
Eli Friedman via llvm-dev
2020-Jan-28 00:31 UTC
[llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated
I assume by “drop support”, you mean reject it in the bitcode reader/IR parser? We can’t reasonably support a complex feature like inalloca if nobody is testing it. If we can’t reasonably upgrade it, and we don’t think there are any users other than clang targeting 32-bit Windows, probably dropping support is best. More details comments on the proposal: “llvm.call.setup must have exactly one corresponding call site”: Normal IR rules would allow cloning the call site (in jump threading), or erasing the call site (if there’s a noreturn call in an argument). What’s the benefit of enforcing this rule, as opposed to just saying all the call sites must have the same signature? The proposal doesn’t address what happens if llvm.call.setup is called while there’s another llvm.call.setup still active. Is it legal to call llvm.call.setup in a loop? Or should nested llvm.call.setup calls have the parent callsetup token as an operand? Is there some way we can allow optimizations if we can’t modify the callee, but we can prove nothing captures the address of the preallocated region before the call? I guess under the current proposal we could transform preallocated->byval, but that isn’t very exciting. How does this interact with other dynamic stack allocations? Should we switch VLAs to use a similar mechanism? (The problems with dynamic alloca in general aren’t as terrible, but it might still benefit: for example, it’s much easier to transform a dynamic allocation into a static allocation.) “If an exception is thrown and caught within the call setup region, the newly established SP must be saved into the EH record when a call is setup.” What makes this case special vs. what we currently implement? Is this currently broken? Or is it related to supporting frame pointer elimination? -Eli From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Reid Kleckner via llvm-dev Sent: Sunday, January 26, 2020 3:12 PM To: llvm-dev <llvm-dev at lists.llvm.org> Subject: [EXT] [llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated Hello all, A few years ago, I added the inalloca feature to LLVM IR so that Clang could be C++ ABI compatible with MSVC on 32-bit x86. The feature works, but there is room for improvement. I recently took the time to write up a design using token values that will hopefully be better named and easier to work with and around. For the technical details of the proposal, I've written up the RFC in Markdown here: https://github.com/rnk/llvm-project/blob/call-setup-docs/llvm/docs/CallSetup.md I've pasted the text below if you want to quote and reply on the list. The main question I have for the community is, given that it is infeasible to upgrade inalloca to llvm.call.setup, can we drop support for the old IR? So far as I am aware, Clang has been the only user of this attribute, and only for i*86-windows-msvc targets. The main goal of adding llvm.call.setup is to get rid of inalloca, so if all LLVM IR transforms have to keep knowing about it, there is no reason to add a new way to do the same thing. Thanks, Reid The last RFC: http://lists.llvm.org/pipermail/llvm-dev/2013-July/064218.html Current inalloca docs: https://llvm.org/docs/InAlloca.html ---------------- # RFC: Replace inalloca with llvm.call.setup and preallocated In order to pass non-trivially copyable objects by value in a way that is compatible with the Visual C++ compiler on 32-bit x86, Clang has to be able to separate the allocation of call argument memory from the call site. The `inalloca` feature was added to LLVM for this purpose. However, this feature usually results in inefficient code, and is often incompatible with otherwise straightforward LLVM IR transforms. Therefore, I would like to get rid of it and replace it with `llvm.call.setup`, which I hope will be more maintainable in the long run. These are some of the drawbacks of inalloca that I want to fix with the new IR: - Blocks most interprocedural prototype changing transforms: dead argument elimination, argument promotion, application of fastcc, etc. - Blocks alias analysis: Unrelated args in memory are hard to analyze - Blocks function attribute inference for unrelated parameters - Hard for frontends to use because one argument can affect every other argument Since inalloca was added, the token type was added to LLVM. Transforms are not allowed to obscure the definition of a token value, and tokens can be used to create single-entry, multi-exit regions in LLVM IR. This allows the creation of a call setup operation that must remain paired with its call site throughout mid-level optimization. That guarantee allows the backend to look at the call site when emitting the call setup instructions. If the IR for a call setup forms a proper region, that can also help the backend perform frame pointer elimination in many cases. ## New IR Features Here is the list of new IR features I think this requires: Intrinsics: - `token @llvm.call.setup(i32 %numArgs)` - `i8* @llvm.call.alloc(token %site, i32 %argIdx)` - `void @llvm.call.teardown(token)` Attributes: - `preallocated(<ty>)`, similar to byval, but no copy Bundles: - `[ "callsetup"(token %site) ]` Verifier rules: - `llvm.call.setup` must have exactly one corresponding call site - call site must have equal number of preallocated args to `llvm.call.setup` ## Intended Usage Here is an example of LLVM IR for an unpacked call site: ```llvm %cs = call token @llvm.call.setup(i32 3) %m0 = call i8* @llvm.call.alloc(token %cs, i32 0) ;; allocates {i32} call void @llvm.memset*(i8* %m0, i32 0, i32 4) %m1 = call i8* @llvm.call.alloc(token %cs, i32 1) ;; allocates {i32, i32} call void @llvm.memset*(i8* %m1, i32 0, i32 8) %m2 = call i8* @llvm.call.alloc(token %cs, i32 2) ;; allocates {i32, i32, i32} call void @llvm.memset*(i8* %m2, i32 0, i32 12) call void @use_callsetup( {i32}* preallocated({i32}) %m1, i32 13, {i32,i32}* preallocated({i32,i32}) %m2, i32 42, {i32,i32,i32}* preallocated({i32,i32,i32}) %m3) [ "callsetup"(token %cs) ] ``` Many transforms will need to be made aware of the new verifier guarantees, but they should only block optimizations on preallocated arguments. The goal is that unrelated arguments, such as the i32 arguments above, remain unaffected. DAE, for example, is free to eliminate the plain integers, but it cannot eliminate a preallocated argument without adjusting the call.setup. The next most important thing to keep in mind is how this interacts with exception handling. This is where `llvm.call.teardown` comes into play. The idea is that, in a call region, clang should push an exceptional cleanup onto the cleanup stack. Here is what the IR for the previous example would look like, assuming each argument has a constructor that may throw: ```llvm %cs = call token @llvm.call.setup(i32 3) %m0 = call i8* @llvm.call.alloc(token %cs, i32 0) invoke void @ctor0(i8* %m0) to label %cont0 unwind label %cleanupCall cont0: %m1 = call i8* @llvm.call.alloc(token %cs, i32 1) invoke void @ctor1(i8* %m1) to label %cont1 unwind label %cleanup0 cont1: %m2 = call i8* @llvm.call.alloc(token %cs, i32 2) invoke void @ctor2(i8* %m2) to label %cont2 unwind label %cleanup1 cont2: call void @use_callsetup(i8* preallocated %m1, i32 13, i8* preallocated %m2, i32 42, i8* preallocated %m3) [ "callsetup"(token %cs) ] cleanup1: %cl1 = cleanuppad unwind to caller call void @dtor(i8* %m1) [ "funclet"(token %cl1) ] cleanupret %cl1 to label %cleanup0 cleanup0: %cl0 = cleanuppad unwind to caller call void @dtor(i8% %m0) [ "funclet"(token %cl2) ] cleanupret %cl0 to label %cleanupCall cleanupCall: %clC = cleanuppad unwind to caller call void @llvm.call.teardown(token %cs) ;; Or llvm.call.cleanup? cleanupret %clC to caller ``` Generally, cleanups to tear down a call setup region are not needed if control cannot return to the current function. The cleanupCall block is an example of such an unnecessary cleanup. However, to make things easy for the inliner, the frontend is required to emit these cleanups. Prior to code generation, the WinEHPrepare pass can remove any unneeded argument memory cleanups. ## Canonical Form `llvm.call.setup` is designed for compatibility, not for performance or analyzability. Mid-level optimization passes should generally try to remove these intrinsics and attributes when possible. When it is possible to change the prototype of a function using these attributes, call sites should be canonicalized in the following way: - Create a static alloca of appropriate type for each preallocated argument - Replace all uses of `llvm.call.alloc` with the corresponding new alloca - Insert `lifetime.start` for each static alloca at the original alloc site, and `llvm.lifetime.end` after the call and at `llvm.call.teardown`. - Remove all setup, alloc, teardown intrinsics, and remove the `preallocated` attributes from caller and callee. GlobalOpt seems like a logical place for this transform. This form should enable downstream optimizations, and reduce the number of transforms that need to be taught that `llvm.call.alloc` creates stack memory. ## Inliner Considerations When inlining a call site that uses `llvm.call.setup`, the inliner should also make the transforms described above, only without adjusting the callee's prototype. ## Corner cases ### Catching an exception within a call region If an exception is thrown and caught within the call setup region, the newly established SP must be saved into the EH record when a call is setup. Consider the case below of inlining try / catch into a call region: ```c++ struct Foo { Foo(int x); Foo(const Foo &o); ~Foo(); int x, y, z; }; void use_callsetup(int, Foo, Foo); int maythrow2(); static inline int maythrow() { try { return maythrow2(); } catch (int) {} return 0; } Foo getf(); int main() { use_callsetup(maythrow(), getf(), getf()); } ``` The backend should be able to detect whether SP must be saved or not by checking if there are any invokes reachable along normal paths from the call setup that do not first reach the call itself, or a normal `llvm.call.teardown`. ### Non-exceptional call region exit It is possible, using statement expressions, to exit a call setup region with normal control flow. In this case, the Clang should emit a normal cleanup to call `llvm.eh.teardown`. Consider: ```c++ #define RETURN_IF_ERR(maybeError) \ ({ \ auto val = maybeError; \ if (val.isError()) \ return val.getError(); \ val.getValue; \ }) ErrorOr<int> mayFail(); void use_callsetup(int, Foo); void f(bool cond) { use_callsetup(RETURN_IF_ERR(mayFail()), Foo()); } ``` ### Inlining non-EH code into EH code If exceptions are disabled, the frontend should not emit exceptional cleanups to teardown the call region. However, this code could be inlined into a function that uses exceptions, and the caller could catch an exception thrown through the non-EH code. Generally this should be impossible, because the calls will be marked nounwind, and it is UB to throw an exception through a nounwind call site. However, SEH allows exceptions to be thrown through nounwind call sites. Consider: ```c++ int mayThrow(); void use_callsetup(int, Foo); static inline void inlineMe() { void use_callsetup(mayThrow(), Foo()); } void f(int numRetries) { for (int i = 0; i < numRetries; i++) { __try { inlineMe(); } __except (1) { } } } ``` Unless the compiler inserts stack adjustments along the unwind path, this well setup call argument memory for every loop iteration, and never clean it up. This could result in code generation bugs or excessive stack memory use. The two options are: 1. Refuse to inline functions containing call regions through invoke call sites 1. Teach the inliner to synthesize llvm.call.teardown cleanups inside call regions 1. Do nothing, the frontend already disables inlining into `__try` Given that this corner case seems specific to SEH, the third option seems most reasonable. ## Implementation Steps In the spirit of incremental development, I think the implementation could be broken down into the following patch series: - IR: LLVM IR intrinsics and attributes - Clang: IRGen, under -cc1 flag - X86: backend implementation: inefficient, no EH - Transforms: Update inliner and globalopt, audit other transforms - X86: MSVC C++ EH SP management - ... test it on Chrome with -cc1 flag - Clang: Remove cc1 flag and inalloca IRGen logic - ... announce inalloca removal from LLVM IR - inalloca removal ## Backwards compatibility It may be possible to upgrade some bitcode from `inalloca` to `llvm.call.setup`, but in cases with complex control flow where the allocation site does not dominate the call, it will not be straightforward. Therefore, I think we should make an exception to LLVM's usual policy of bitcode backwards compatibility, and drop support for `inalloca`. The `inalloca` attribute was generally a source of bugs, and was only used for code targetting `i*86-windows-msvc`. To the best of my knowledge, there are no users of LLVM who archive bitcode for that platform and expect to be able to upgrade it for use with future LLVM versions. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200128/89e88bdf/attachment.html>
Reid Kleckner via llvm-dev
2020-Jan-28 00:57 UTC
[llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated
On Mon, Jan 27, 2020 at 4:31 PM Eli Friedman <efriedma at quicinc.com> wrote:> I assume by “drop support”, you mean reject it in the bitcode reader/IR > parser? We can’t reasonably support a complex feature like inalloca if > nobody is testing it. If we can’t reasonably upgrade it, and we don’t think > there are any users other than clang targeting 32-bit Windows, probably > dropping support is best. >That's a good point. There are already enough lightly tested features in LLVM. There's no reason to leave another one lying around like a trap for the first unsuspecting user to try it. More details comments on the proposal:> > > > “llvm.call.setup must have exactly one corresponding call site”: Normal IR > rules would allow cloning the call site (in jump threading), or erasing the > call site (if there’s a noreturn call in an argument). What’s the benefit > of enforcing this rule, as opposed to just saying all the call sites must > have the same signature? >I think we could cope with unreachable code elimination deleting a paired call site (zero or one), but code duplication creating a second call site could be problematic. The call setup doesn't describe the prototype of the main call site, so if there were multiple call sites, the backend would have to pick one call site arbitrarily or compare the call sites when setting up the call. If there are zero call sites, the backend can create static allocas of the appropriate type to satisfy the allocations. Of course, an IR pass (instcombine?) should do this transform first if it sees it. Maybe we could have CGP take care of it, too.> The proposal doesn’t address what happens if llvm.call.setup is called > while there’s another llvm.call.setup still active. Is it legal to call > llvm.call.setup in a loop? Or should nested llvm.call.setup calls have the > parent callsetup token as an operand? >Nested setup is OK, but the verifier rule that there must be a paired call site should make it impossible to do in a loop. I guess we should have some rule to reject the following: %cs1 = llvm.call.setup() %cs2 = llvm.call.setup() call void @cs1() [ "callsetup"(token %cs1) ] call void @cs2() [ "callsetup"(token %cs2) ]> Is there some way we can allow optimizations if we can’t modify the > callee, but we can prove nothing captures the address of the preallocated > region before the call? I guess under the current proposal we could > transform preallocated->byval, but that isn’t very exciting. >I suppose we could say that the combo of byval+preallocated just means `byval`, and teach transforms that that's OK.> How does this interact with other dynamic stack allocations? Should we > switch VLAs to use a similar mechanism? (The problems with dynamic alloca > in general aren’t as terrible, but it might still benefit: for example, > it’s much easier to transform a dynamic allocation into a static > allocation.) >VLAs could use something like this, but they are generally of unknown size while call sites have a known fixed size. I think that makes them pretty different.> “If an exception is thrown and caught within the call setup region, the > newly established SP must be saved into the EH record when a call is > setup.” What makes this case special vs. what we currently implement? Is > this currently broken? Or is it related to supporting frame pointer > elimination? >I think of it as a special case because you can't write this in standard C++. Today, I think we leak stack memory in this case. There's no correctness issue because we copy SP into its own virtual register at the point of the alloca, and arguments are addressed relative to the vreg. What I had in mind for the new system is that we make some kind of fixed stack object that uses pre-computed SP offsets, assuming there are no dynamic allocas in the function. This would be a problem for a program that does: setup call 1 store call 1 arg 0 try { setup call 2 throw exception call 2 } catch (...) {} ; call 2's frame is still on the stack store call 1 arg 1 ; SP offset would be incorrect call 1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200127/a1ff9555/attachment.html>
Reasonably Related Threads
- [RFC] Replacing inalloca with llvm.call.setup and preallocated
- [RFC] Replacing inalloca with llvm.call.setup and preallocated
- [RFC] Replacing inalloca with llvm.call.setup and preallocated
- [RFC] Replacing inalloca with llvm.call.setup and preallocated
- [LLVMdev] byval in a world without pointee types