Xun Li via llvm-dev
2020-May-08 20:34 UTC
[llvm-dev] Noncapture use of locals disabling TailRecursionElimination
Hi, I was looking into the implementation of TailRecursionElimination, and noticed that we have the constrain that if any call uses a local, even though it doesn't capture the local, it would still prohibit TCE. This contain seems unnecessary and overly limiting? Relevant code is here: https://github.com/llvm/llvm-project/blob/cbe77ca9bd05393b1df1abf016b01f44d1d10a49/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp#L275 Looking through the tests that cover this scenario (https://github.com/llvm/llvm-project/blob/e29874eaa04d24b8c67776bf5729474d671a58f6/llvm/test/Transforms/TailCallElim/basic.ll#L66), I found it referring to rdar://14324281 and PR962. What are they? These haven't been updated since 2014, so I wonder what is the latest state and have them been resolved? cc nicholas who authored the original code. Thanks! -- Xun
Nick Lewycky via llvm-dev
2020-May-08 20:53 UTC
[llvm-dev] Noncapture use of locals disabling TailRecursionElimination
On 2020-05-08 1:34 p.m., Xun Li wrote:> Hi, > > I was looking into the implementation of TailRecursionElimination, and > noticed that we have the constrain that if any call uses a local, even > though it doesn't capture the local, it would still prohibit TCE. This > contain seems unnecessary and overly limiting?I think it's a necessary limitation. The idea is that something marked 'tail' could be emitted as a jump instruction instead of a call instruction. In order to do that, the jumpee has to 'ret' to the same place our function would have returned to. Since the return value is the current top-of-the-stack value when returning[1], the jumper must remove any local variables -- in LLVM, allocas -- from the stack before jumping. This implies that the jumpee may not access our locals at all, it is not sufficient for the callee to merely not capture them. Further to that point, we have an optimization in BasicAA where we conclude that a call marked "tail" does not alias any allocas of the calling function. There's a missed optimization here regarding "notail". Because "tail" can be used as an optimization for BasicAA, we should actually permit functions to be marked "tail" (not accessing any of the callers stack) and still be marked "notail" (the machine code must be a call, not a jump). These aren't actually mutually exclusive, even though they sound like they are. Nick [1] - not true on Windows, but even there the number of bytes deep the return value is, is a fixed constant that is determined entirely by the callee. The jumper can't have extra bytes on the stack at the moment of the jump.> Relevant code is here: > https://github.com/llvm/llvm-project/blob/cbe77ca9bd05393b1df1abf016b01f44d1d10a49/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp#L275 > > Looking through the tests that cover this scenario > (https://github.com/llvm/llvm-project/blob/e29874eaa04d24b8c67776bf5729474d671a58f6/llvm/test/Transforms/TailCallElim/basic.ll#L66), > I found it referring to rdar://14324281 and PR962. What are they? > These haven't been updated since 2014, so I wonder what is the latest > state and have them been resolved? > > cc nicholas who authored the original code. > > Thanks! >
Eli Friedman via llvm-dev
2020-May-08 21:23 UTC
[llvm-dev] Noncapture use of locals disabling TailRecursionElimination
Are you specifically asking about AllCallsAreTailCalls? I agree that's overly limiting. If you're going to propose a patch to change it, you probably want to be careful that we don't introduce dynamic allocas. (PR962 is https://bugs.llvm.org/show_bug.cgi?id=962 . rdar is a reference into the Apple internal bugtracker.) -Eli -----Original Message----- From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Xun Li via llvm-dev Sent: Friday, May 8, 2020 1:34 PM To: llvm-dev at lists.llvm.org Cc: nicholas at mxc.ca Subject: [EXT] [llvm-dev] Noncapture use of locals disabling TailRecursionElimination Hi, I was looking into the implementation of TailRecursionElimination, and noticed that we have the constrain that if any call uses a local, even though it doesn't capture the local, it would still prohibit TCE. This contain seems unnecessary and overly limiting? Relevant code is here: https://github.com/llvm/llvm-project/blob/cbe77ca9bd05393b1df1abf016b01f44d1d10a49/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp#L275 Looking through the tests that cover this scenario (https://github.com/llvm/llvm-project/blob/e29874eaa04d24b8c67776bf5729474d671a58f6/llvm/test/Transforms/TailCallElim/basic.ll#L66), I found it referring to rdar://14324281 and PR962. What are they? These haven't been updated since 2014, so I wonder what is the latest state and have them been resolved? cc nicholas who authored the original code. Thanks! -- Xun _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Xun Li via llvm-dev
2020-May-08 21:58 UTC
[llvm-dev] Noncapture use of locals disabling TailRecursionElimination
Eli, Yes I was referring to AllCallsAreTailCalls. I will take a look at how to improve this. Nick, Thanks. I agree that's the proper constrain to mark a call as tailcall, however not being able to mark a call as tailcall shouldn't completely kill TCE. (i.e. AllCallsAreTailCalls seems overly limiting). Specifically, I have been looking into an issue where Clang cannot TCE the following code: class Foo { public: __attribute__((noinline)) ~Foo() {} }; void callee() { Foo f; } void funcRecurse() { callee(); funcRecurse(); } int main() { funcRecurse(); } In the above code, callee will be inlined into funcRecurse, generating calls to ~Foo() along with lifetime management intrinsics, all using locals, disabling TCE. Clang can successfully TCE if we force callee to not inline. On Fri, May 8, 2020 at 1:53 PM Nick Lewycky <nicholas at mxc.ca> wrote:> > On 2020-05-08 1:34 p.m., Xun Li wrote: > > Hi, > > > > I was looking into the implementation of TailRecursionElimination, and > > noticed that we have the constrain that if any call uses a local, even > > though it doesn't capture the local, it would still prohibit TCE. This > > contain seems unnecessary and overly limiting? > > I think it's a necessary limitation. The idea is that something marked > 'tail' could be emitted as a jump instruction instead of a call > instruction. In order to do that, the jumpee has to 'ret' to the same > place our function would have returned to. Since the return value is the > current top-of-the-stack value when returning[1], the jumper must remove > any local variables -- in LLVM, allocas -- from the stack before > jumping. This implies that the jumpee may not access our locals at all, > it is not sufficient for the callee to merely not capture them. > > Further to that point, we have an optimization in BasicAA where we > conclude that a call marked "tail" does not alias any allocas of the > calling function. > > There's a missed optimization here regarding "notail". Because "tail" > can be used as an optimization for BasicAA, we should actually permit > functions to be marked "tail" (not accessing any of the callers stack) > and still be marked "notail" (the machine code must be a call, not a > jump). These aren't actually mutually exclusive, even though they sound > like they are. > > Nick > > [1] - not true on Windows, but even there the number of bytes deep the > return value is, is a fixed constant that is determined entirely by the > callee. The jumper can't have extra bytes on the stack at the moment of > the jump. > > > Relevant code is here: > > https://github.com/llvm/llvm-project/blob/cbe77ca9bd05393b1df1abf016b01f44d1d10a49/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp#L275 > > > > Looking through the tests that cover this scenario > > (https://github.com/llvm/llvm-project/blob/e29874eaa04d24b8c67776bf5729474d671a58f6/llvm/test/Transforms/TailCallElim/basic.ll#L66), > > I found it referring to rdar://14324281 and PR962. What are they? > > These haven't been updated since 2014, so I wonder what is the latest > > state and have them been resolved? > > > > cc nicholas who authored the original code. > > > > Thanks! > > >-- Xun
Possibly Parallel Threads
- Noncapture use of locals disabling TailRecursionElimination
- Noncapture use of locals disabling TailRecursionElimination
- [LLVMdev] Preserving ProfileInfo in several Passes
- [PATCH] D12923: Add support for function attribute "notail"
- [LLVMdev] Preserving ProfileInfo in several Passes