On Monday 23 February 2009 12:08:00 Arnold Schwaighofer
wrote:> Hello Duncan and Jon,
>
> I am the criminal responsible for the tail call implementation in the
> backends.
Thanks! :-)
> On Mon, Feb 23, 2009 at 9:17 AM, Duncan Sands <baldrick at free.fr>
wrote:
> > Hi Jon,
> >
> >> >From what I have understood of the LLVM docs about when tail
calls get
> >>
> >> eliminated on x86 and x64 it should be a tail call, yes.
>
> See below.
>
> > this list is for the code generator, and it seems obviously
incomplete:
> > it makes no mention of local variables (alloca).  Probably it is
> > implicitly assuming that the call was marked "tail call" by
the LLVM
> > optimizers.  So you also need to check under what conditions the LLVM
> > optimizers do that.
>
> The backend does not check the following condition but assumes it to be
> true.
>
> >From <http://llvm.org/docs/LangRef.html#i_call>:
>
> The optional "tail" marker indicates whether the callee function
> accesses any allocas or varargs in the caller. If the "tail"
marker is
> present, the function call is eligible for tail call optimization.
> Note that calls may be marked "tail" even if they do not occur
before
> a ret instruction.
>
> If the llvm IR is generated by a front end (HLVM) the front end must
> guarantee this condition.
Right, that is in agreement with my observations:
1. Blindly setting "tail" everywhere in HLVM breaks some of LLVM's
optimization passes (which blindly assume it to be correct) and culminates in 
the generation of broken code.
2. Passing a reference to a local struct on the current stack frame through a 
tail call by reference is *not* detected as an error by LLVM and produces 
code that corrupts data by overwriting the memory locations of the struct 
from the caller with whatever the callee uses the same stack frame for.
> >> > I didn't look closely but at a glance it seems to be
passing a local
> >> > stack variable as a call parameter.
>
> To me, also only glancing over the code and with a bit rusty llvm ir
> knowledge, it seems that %14 (the value passed to the tail call)is
> allocated in the stack frame  (alloca) of the calling function. So the
> above condition is violated.
>
> %3   = alloca { { i8*, i8* }*, i8* }
> %13 = getelementptr { { i8*, i8* }*, i8* }* %3, i32 0
> %14 = load { { i8*, i8* }*, i8* }* %13
> %17 = tail call fastcc { { i8*, i8* }*, i8* }
>      @init({ { i8*, i8* }*,i8*} %14, i32 %16)
I do not understand. Surely you should be testing for structs locally 
allocated on the caller's frame being passed *by reference* to the callee?
In
this case, the struct is passed by value so the fact that it came from the 
stack should be irrelevant: that stack space is not being depended upon 
because the fields of the struct are now in registers.
This is all swinging on my interpretation of first-class structs though. I am 
under the impression that they are essentially a set of registers holding 
their first-class fields. So you can pass them by value as arguments to 
functions with the behaviour of any other first-class type regardless of 
where they came from.
> >> Is this a bug in LLVM?
> >
> > Could be, but first I'd like to be sure that you are not misusing
tail
> > calls.
>
> Assuming that i am right and a pointer to alloca'ed memory is passed
> this is not a bug but a misuse of the 'tail' instruction.
I disagree. I think the fields of the struct should be passed by value with no 
pointer to the original alloca'd data.
Moreover, I now have evidence that LLVM is not behaving as you expect:
3. Adjusting the return value from this function into sret form results in 
tail call elimination being performed correctly. Note that this is still 
passing a first-class struct by value as an argument to a function through a 
tail call.
Here is my code that does get tail call eliminated correctly:
define fastcc i32 @init({ i8*, i8* }*, { i8*, i8* }, i32) {
entry:
	%3 = alloca { i32, { i8*, i8* } }		; <{ i32, { i8*, i8* } }*> [#uses=3]
	%4 = alloca { i8*, i8* }		; <{ i8*, i8* }*> [#uses=3]
	br label %start
start:		; preds = %entry
	%5 = bitcast { i32, { i8*, i8* } }* %3 to i32*		; <i32*> [#uses=1]
	store i32 %2, i32* %5
	%6 = getelementptr { i32, { i8*, i8* } }* %3, i32 0, i32 1		; <{ i8*, i8*
}*>
[#uses=1]
	store { i8*, i8* } %1, { i8*, i8* }* %6
	%7 = bitcast { i32, { i8*, i8* } }* %3 to { i32, { i8*, i8* } }*		; <{ i32,
{
i8*, i8* } }*> [#uses=1]
	%8 = load { i32, { i8*, i8* } }* %7		; <{ i32, { i8*, i8* } }> [#uses=1]
	%9 = malloc { i32, { i8*, i8* } }		; <{ i32, { i8*, i8* } }*> [#uses=2]
	%10 = bitcast { i32, { i8*, i8* } }* %9 to { i32, { i8*, i8* } }*		; <{ i32,
{ i8*, i8* } }*> [#uses=1]
	store { i32, { i8*, i8* } } %8, { i32, { i8*, i8* } }* %10
	%11 = bitcast { i32, { i8*, i8* } }* %9 to i8*		; <i8*> [#uses=1]
	%12 = bitcast { i8*, i8* }* %4 to i8**		; <i8**> [#uses=1]
	store i8* bitcast ({ i8*, i32 ({ i8*, i8* })* }* @Cons to i8*), i8** %12
	%13 = getelementptr { i8*, i8* }* %4, i32 0, i32 1		; <i8**> [#uses=1]
	store i8* %11, i8** %13
	%14 = bitcast { i8*, i8* }* %4 to { i8*, i8* }*		; <{ i8*, i8* }*>
[#uses=1]
	%15 = load { i8*, i8* }* %14		; <{ i8*, i8* }> [#uses=2]
	%16 = icmp eq i32 %2, 0		; <i1> [#uses=1]
	br i1 %16, label %pass, label %fail
fail:		; preds = %start
	%17 = sub i32 %2, 1		; <i32> [#uses=1]
	%18 = tail call fastcc i32 @init({ i8*, i8* }* %0, { i8*, i8* } %15, 
i32 %17)		; <i32> [#uses=1]
	ret i32 %18
pass:		; preds = %start
	%19 = bitcast { i8*, i8* }* %0 to { i8*, i8* }*		; <{ i8*, i8* }*>
[#uses=1]
	store { i8*, i8* } %15, { i8*, i8* }* %19
	ret i32 0
}
> Maybe we should add a link or a note on
> <http://llvm.org/docs/CodeGenerator.html#tailcallopt> to
> <http://llvm.org/docs/LangRef.html#i_call> that explains that the
call
> can only be marked as 'tail' if no pointers to alloca'ed memory
are
> passed.
The documentation should probably be clarified but I am leaning toward this 
being a bug in LLVM itself: tail call elimination should be done when the 
return value is a first-class struct.
Also, I said before that I had found first-class structs to be broken. In 
retrospect, I think I was probably observing this phenomenon and seeing a 
segmentation fault due to stack overflow caused by broken tail calls and not 
broken structs.
-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e