Nick Lewycky via llvm-dev
2020-May-08 20:53 UTC
[llvm-dev] Noncapture use of locals disabling TailRecursionElimination
On 2020-05-08 1:34 p.m., Xun Li wrote:> Hi, > > I was looking into the implementation of TailRecursionElimination, and > noticed that we have the constrain that if any call uses a local, even > though it doesn't capture the local, it would still prohibit TCE. This > contain seems unnecessary and overly limiting?I think it's a necessary limitation. The idea is that something marked 'tail' could be emitted as a jump instruction instead of a call instruction. In order to do that, the jumpee has to 'ret' to the same place our function would have returned to. Since the return value is the current top-of-the-stack value when returning[1], the jumper must remove any local variables -- in LLVM, allocas -- from the stack before jumping. This implies that the jumpee may not access our locals at all, it is not sufficient for the callee to merely not capture them. Further to that point, we have an optimization in BasicAA where we conclude that a call marked "tail" does not alias any allocas of the calling function. There's a missed optimization here regarding "notail". Because "tail" can be used as an optimization for BasicAA, we should actually permit functions to be marked "tail" (not accessing any of the callers stack) and still be marked "notail" (the machine code must be a call, not a jump). These aren't actually mutually exclusive, even though they sound like they are. Nick [1] - not true on Windows, but even there the number of bytes deep the return value is, is a fixed constant that is determined entirely by the callee. The jumper can't have extra bytes on the stack at the moment of the jump.> Relevant code is here: > https://github.com/llvm/llvm-project/blob/cbe77ca9bd05393b1df1abf016b01f44d1d10a49/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp#L275 > > Looking through the tests that cover this scenario > (https://github.com/llvm/llvm-project/blob/e29874eaa04d24b8c67776bf5729474d671a58f6/llvm/test/Transforms/TailCallElim/basic.ll#L66), > I found it referring to rdar://14324281 and PR962. What are they? > These haven't been updated since 2014, so I wonder what is the latest > state and have them been resolved? > > cc nicholas who authored the original code. > > Thanks! >
Nick Lewycky via llvm-dev
2020-May-08 23:38 UTC
[llvm-dev] Noncapture use of locals disabling TailRecursionElimination
On 2020-05-08 2:58 p.m., Xun Li wrote:> Eli, > Yes I was referring to AllCallsAreTailCalls. I will take a look at how > to improve this. > > Nick, > Thanks. I agree that's the proper constrain to mark a call as > tailcall, however not being able to mark a call as tailcall shouldn't > completely kill TCE. (i.e. AllCallsAreTailCalls seems overly > limiting).I get it now. TailRecursionElimination.cpp does two optimizations, marking of tail and converting recursion to loops. I thought you were proposing a change to the marking of tail. Thanks for the example! "PR962" refers to "problem report 962" or https://llvm.org/PR962 . "rdar" is Apple's "radar" bug tracking system, these are generally internal to their company but sometimes they're available in whole or in part on https://openradar.appspot.com/ . Nick> Specifically, I have been looking into an issue where Clang cannot TCE > the following code: > > class Foo { > public: > __attribute__((noinline)) > ~Foo() {} > }; > > void callee() { > Foo f; > } > > void funcRecurse() { > callee(); > funcRecurse(); > } > > int main() { > funcRecurse(); > } > > In the above code, callee will be inlined into funcRecurse, generating > calls to ~Foo() along with lifetime management intrinsics, all using > locals, disabling TCE. Clang can successfully TCE if we force callee > to not inline. > > On Fri, May 8, 2020 at 1:53 PM Nick Lewycky <nicholas at mxc.ca> wrote: >> On 2020-05-08 1:34 p.m., Xun Li wrote: >>> Hi, >>> >>> I was looking into the implementation of TailRecursionElimination, and >>> noticed that we have the constrain that if any call uses a local, even >>> though it doesn't capture the local, it would still prohibit TCE. This >>> contain seems unnecessary and overly limiting? >> I think it's a necessary limitation. The idea is that something marked >> 'tail' could be emitted as a jump instruction instead of a call >> instruction. In order to do that, the jumpee has to 'ret' to the same >> place our function would have returned to. Since the return value is the >> current top-of-the-stack value when returning[1], the jumper must remove >> any local variables -- in LLVM, allocas -- from the stack before >> jumping. This implies that the jumpee may not access our locals at all, >> it is not sufficient for the callee to merely not capture them. >> >> Further to that point, we have an optimization in BasicAA where we >> conclude that a call marked "tail" does not alias any allocas of the >> calling function. >> >> There's a missed optimization here regarding "notail". Because "tail" >> can be used as an optimization for BasicAA, we should actually permit >> functions to be marked "tail" (not accessing any of the callers stack) >> and still be marked "notail" (the machine code must be a call, not a >> jump). These aren't actually mutually exclusive, even though they sound >> like they are. >> >> Nick >> >> [1] - not true on Windows, but even there the number of bytes deep the >> return value is, is a fixed constant that is determined entirely by the >> callee. The jumper can't have extra bytes on the stack at the moment of >> the jump. >> >>> Relevant code is here: >>> https://github.com/llvm/llvm-project/blob/cbe77ca9bd05393b1df1abf016b01f44d1d10a49/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp#L275 >>> >>> Looking through the tests that cover this scenario >>> (https://github.com/llvm/llvm-project/blob/e29874eaa04d24b8c67776bf5729474d671a58f6/llvm/test/Transforms/TailCallElim/basic.ll#L66), >>> I found it referring to rdar://14324281 and PR962. What are they? >>> These haven't been updated since 2014, so I wonder what is the latest >>> state and have them been resolved? >>> >>> cc nicholas who authored the original code. >>> >>> Thanks! >>> >
Reasonably Related Threads
- Noncapture use of locals disabling TailRecursionElimination
- Noncapture use of locals disabling TailRecursionElimination
- [PATCH] D12923: Add support for function attribute "notail"
- [PATCH] D12923: Add support for function attribute "notail"
- [LLVMdev] CMake problem of LLVM 3.3