Nick Lewycky via llvm-dev
2020-May-08 23:38 UTC
[llvm-dev] Noncapture use of locals disabling TailRecursionElimination
On 2020-05-08 2:58 p.m., Xun Li wrote:> Eli, > Yes I was referring to AllCallsAreTailCalls. I will take a look at how > to improve this. > > Nick, > Thanks. I agree that's the proper constrain to mark a call as > tailcall, however not being able to mark a call as tailcall shouldn't > completely kill TCE. (i.e. AllCallsAreTailCalls seems overly > limiting).I get it now. TailRecursionElimination.cpp does two optimizations, marking of tail and converting recursion to loops. I thought you were proposing a change to the marking of tail. Thanks for the example! "PR962" refers to "problem report 962" or https://llvm.org/PR962 . "rdar" is Apple's "radar" bug tracking system, these are generally internal to their company but sometimes they're available in whole or in part on https://openradar.appspot.com/ . Nick> Specifically, I have been looking into an issue where Clang cannot TCE > the following code: > > class Foo { > public: > __attribute__((noinline)) > ~Foo() {} > }; > > void callee() { > Foo f; > } > > void funcRecurse() { > callee(); > funcRecurse(); > } > > int main() { > funcRecurse(); > } > > In the above code, callee will be inlined into funcRecurse, generating > calls to ~Foo() along with lifetime management intrinsics, all using > locals, disabling TCE. Clang can successfully TCE if we force callee > to not inline. > > On Fri, May 8, 2020 at 1:53 PM Nick Lewycky <nicholas at mxc.ca> wrote: >> On 2020-05-08 1:34 p.m., Xun Li wrote: >>> Hi, >>> >>> I was looking into the implementation of TailRecursionElimination, and >>> noticed that we have the constrain that if any call uses a local, even >>> though it doesn't capture the local, it would still prohibit TCE. This >>> contain seems unnecessary and overly limiting? >> I think it's a necessary limitation. The idea is that something marked >> 'tail' could be emitted as a jump instruction instead of a call >> instruction. In order to do that, the jumpee has to 'ret' to the same >> place our function would have returned to. Since the return value is the >> current top-of-the-stack value when returning[1], the jumper must remove >> any local variables -- in LLVM, allocas -- from the stack before >> jumping. This implies that the jumpee may not access our locals at all, >> it is not sufficient for the callee to merely not capture them. >> >> Further to that point, we have an optimization in BasicAA where we >> conclude that a call marked "tail" does not alias any allocas of the >> calling function. >> >> There's a missed optimization here regarding "notail". Because "tail" >> can be used as an optimization for BasicAA, we should actually permit >> functions to be marked "tail" (not accessing any of the callers stack) >> and still be marked "notail" (the machine code must be a call, not a >> jump). These aren't actually mutually exclusive, even though they sound >> like they are. >> >> Nick >> >> [1] - not true on Windows, but even there the number of bytes deep the >> return value is, is a fixed constant that is determined entirely by the >> callee. The jumper can't have extra bytes on the stack at the moment of >> the jump. >> >>> Relevant code is here: >>> https://github.com/llvm/llvm-project/blob/cbe77ca9bd05393b1df1abf016b01f44d1d10a49/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp#L275 >>> >>> Looking through the tests that cover this scenario >>> (https://github.com/llvm/llvm-project/blob/e29874eaa04d24b8c67776bf5729474d671a58f6/llvm/test/Transforms/TailCallElim/basic.ll#L66), >>> I found it referring to rdar://14324281 and PR962. What are they? >>> These haven't been updated since 2014, so I wonder what is the latest >>> state and have them been resolved? >>> >>> cc nicholas who authored the original code. >>> >>> Thanks! >>> >
Xun Li via llvm-dev
2020-May-09 00:13 UTC
[llvm-dev] Noncapture use of locals disabling TailRecursionElimination
I see. Thank you Nick for the detailed explanations! I understand it now. On Fri, May 8, 2020 at 4:38 PM Nick Lewycky <nicholas at mxc.ca> wrote:> > On 2020-05-08 2:58 p.m., Xun Li wrote: > > Eli, > > Yes I was referring to AllCallsAreTailCalls. I will take a look at how > > to improve this. > > > > Nick, > > Thanks. I agree that's the proper constrain to mark a call as > > tailcall, however not being able to mark a call as tailcall shouldn't > > completely kill TCE. (i.e. AllCallsAreTailCalls seems overly > > limiting). > > I get it now. TailRecursionElimination.cpp does two optimizations, > marking of tail and converting recursion to loops. I thought you were > proposing a change to the marking of tail. Thanks for the example! > > "PR962" refers to "problem report 962" or https://llvm.org/PR962 . > "rdar" is Apple's "radar" bug tracking system, these are generally > internal to their company but sometimes they're available in whole or in > part on https://openradar.appspot.com/ . > > Nick > > > Specifically, I have been looking into an issue where Clang cannot TCE > > the following code: > > > > class Foo { > > public: > > __attribute__((noinline)) > > ~Foo() {} > > }; > > > > void callee() { > > Foo f; > > } > > > > void funcRecurse() { > > callee(); > > funcRecurse(); > > } > > > > int main() { > > funcRecurse(); > > } > > > > In the above code, callee will be inlined into funcRecurse, generating > > calls to ~Foo() along with lifetime management intrinsics, all using > > locals, disabling TCE. Clang can successfully TCE if we force callee > > to not inline. > > > > On Fri, May 8, 2020 at 1:53 PM Nick Lewycky <nicholas at mxc.ca> wrote: > >> On 2020-05-08 1:34 p.m., Xun Li wrote: > >>> Hi, > >>> > >>> I was looking into the implementation of TailRecursionElimination, and > >>> noticed that we have the constrain that if any call uses a local, even > >>> though it doesn't capture the local, it would still prohibit TCE. This > >>> contain seems unnecessary and overly limiting? > >> I think it's a necessary limitation. The idea is that something marked > >> 'tail' could be emitted as a jump instruction instead of a call > >> instruction. In order to do that, the jumpee has to 'ret' to the same > >> place our function would have returned to. Since the return value is the > >> current top-of-the-stack value when returning[1], the jumper must remove > >> any local variables -- in LLVM, allocas -- from the stack before > >> jumping. This implies that the jumpee may not access our locals at all, > >> it is not sufficient for the callee to merely not capture them. > >> > >> Further to that point, we have an optimization in BasicAA where we > >> conclude that a call marked "tail" does not alias any allocas of the > >> calling function. > >> > >> There's a missed optimization here regarding "notail". Because "tail" > >> can be used as an optimization for BasicAA, we should actually permit > >> functions to be marked "tail" (not accessing any of the callers stack) > >> and still be marked "notail" (the machine code must be a call, not a > >> jump). These aren't actually mutually exclusive, even though they sound > >> like they are. > >> > >> Nick > >> > >> [1] - not true on Windows, but even there the number of bytes deep the > >> return value is, is a fixed constant that is determined entirely by the > >> callee. The jumper can't have extra bytes on the stack at the moment of > >> the jump. > >> > >>> Relevant code is here: > >>> https://github.com/llvm/llvm-project/blob/cbe77ca9bd05393b1df1abf016b01f44d1d10a49/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp#L275 > >>> > >>> Looking through the tests that cover this scenario > >>> (https://github.com/llvm/llvm-project/blob/e29874eaa04d24b8c67776bf5729474d671a58f6/llvm/test/Transforms/TailCallElim/basic.ll#L66), > >>> I found it referring to rdar://14324281 and PR962. What are they? > >>> These haven't been updated since 2014, so I wonder what is the latest > >>> state and have them been resolved? > >>> > >>> cc nicholas who authored the original code. > >>> > >>> Thanks! > >>> > > >-- Xun