On Mon, Feb 23, 2009 at 2:13 PM, Jon Harrop <jon at ffconsultancy.com> wrote:> Moreover, I now have evidence that LLVM is not behaving as you expect: > > 3. Adjusting the return value from this function into sret form results in > tail call elimination being performed correctly. Note that this is still > passing a first-class struct by value as an argument to a function through a > tail call.Yes i was wrong the problem is of a different kind and has nothing to do with the alloca. As you say the struct is passed by value. This is not the problem. For example the following code (not containing any allocas) will also not be tail call optimized. define fastcc { { i8*, i8* }*, i8*} @init({ { i8*, i8* }*, i8*}, i32) { entry: %2 = tail call fastcc { { i8*, i8* }*, i8* } @init({ { i8*, i8* }*, i8*} %0, i32 %1) ret { { i8*, i8* }*, i8*} %2 } The problem has to do with how struct returns are represented internally by llvm (in the SelectionDAG) and how the tail call optimization implementation checks if it may perform the tail call. The implementation checks that the <call> node is immediately followed by a <ret> node. A struct return causes the insertion of a <merge_values> node between the tail call instruction and the return instruction node. Because of the intermediate node the backend believes it must not optimize the tail call. [result1, result2] = <call ... > [merged] = <merge_values [result1, result2]> <ret [merged]> So the current situation is that tail calls on functions that return a struct are not optimized (as you correctly observed ;). Sorry. regards arnold
On Monday 23 February 2009 16:14:57 Arnold Schwaighofer wrote:> The problem has to do with how struct returns are represented > internally by llvm (in the SelectionDAG) and how the tail call > optimization implementation checks if it may perform the tail call. > The implementation checks that the <call> node is immediately followed > by a <ret> node. A struct return causes the insertion of a > <merge_values> node between the tail call instruction and the return > instruction node. Because of the intermediate node the backend > believes it must not optimize the tail call. > > [result1, result2] = <call ... > > [merged] = <merge_values [result1, result2]> > <ret [merged]> > > So the current situation is that tail calls on functions that return a > struct are not optimized (as you correctly observed ;). Sorry.Thanks for the clarification. That makes a lot more sense! LLVM's support for structs is wonderful but I don't think they can be called "first-class structs" until all such arbitrary restrictions have been removed, even though the workaround (using sret form) is trivial in this case. Shall I file this as a bug against LLVM? Assuming this gets fixed, will it be more efficient that the sret form? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
On Tue, Feb 24, 2009 at 11:50 AM, Jon Harrop <jon at ffconsultancy.com> wrote:> Thanks for the clarification. That makes a lot more sense! > > LLVM's support for structs is wonderful but I don't think they can be > called "first-class structs" until all such arbitrary restrictions have been > removed, even though the workaround (using sret form) is trivial in this > case. > > Shall I file this as a bug against LLVM?Yes please do and maybe include the following code which already exhibits the error in the bug report. define fastcc { { i8*, i8* }*, i8*} @init({ { i8*, i8* }*, i8*}, i32) { entry: %2 = tail call fastcc { { i8*, i8* }*, i8* } @init({ { i8*, i8*}*, i8*} %0, i32 %1) ret { { i8*, i8* }*, i8*} %2 }> Assuming this gets fixed, will it be more efficient that the sret form?Don't expect this to be fixed soon as i am currently on a leave of absence from llvm busy with bringing tail calls to another vm and writing a thesis about it ;). Whether it will be more efficient i can't answer off hand. Sorry. But probably not because the code generated should be quite similar. For the sret version the move of the result is performed before the return. store { i8*, i8* } %15, { i8*, i8* }* %19 ret i32 0 For the struct return version this would be performed as part of moving the result from the result registers to whatever virtual register is expecting the result. If the register allocator decides to merge the virtual register with the result registers than no further move is needed. %2 = tail call fastcc { { i8*, i8* }*, i8* } @init({ { i8*, i8*}*, i8*} %0, i32 %1) Note that if you have a series of sequential recursive tail calls this move will only performed once (at the bottom of the recursion, respectively when the recursion returns) so it's impact on performance should be minimal. regards arnold
On Feb 24, 2009, at 2:50 AM, Jon Harrop wrote:> On Monday 23 February 2009 16:14:57 Arnold Schwaighofer wrote: >> >> >> So the current situation is that tail calls on functions that >> return a >> struct are not optimized (as you correctly observed ;). Sorry. > > Thanks for the clarification. That makes a lot more sense! > > LLVM's support for structs is wonderful but I don't think they can be > called "first-class structs" until all such arbitrary restrictions > have been > removed, even though the workaround (using sret form) is trivial in > this > case."first-class" in this context means "Values can have this type", not "everything that anyone might want is supported in all backends." A meta issue is that LLVM IR differs from many virtual machine ISAs in that it's more of a language for expressing programs than a specification of what some particular machine can actually execute. When a particular LLVM component doesn't support a particular aspect of the language, it's considered a bug in the component, and not necessarily considered a problem in the language. This arrangement is convenient for people developing LLVM itself, because new language features can be added without the requirement that all backends and execution engines support all features in full generality, which isn't always immediately useful, and sometimes not even meaningful. However, it's inconvenient for people using LLVM, because there's no complete list of which features are actually supported on any given backend. Some features that are commonly unsupported are now mentioned LangRef.html, though it's certainly not complete. One way this issue might be addressed is with an LLVM IR conformance test suite. Covering the whole language would be a big project, but the work could be done incrementally. I wonder if people would be interested in working on this.> Shall I file this as a bug against LLVM?Yes, thanks. Dan