I have written a variety tests of tail calls for my HLVM and all passed with flying colors until I wrote this test (which is actually for algebraic datatypes) and discovered that it segfaults after ~100k iterations through what I think should be a tail call. Here's the IR: define fastcc { { i8*, i8* }*, i8* } @init({ { i8*, i8* }*, i8* }, i32) { entry: %2 = alloca { i32, { { i8*, i8* }*, i8* } } ; <{ i32, { { i8 *, i8* }*, i8* } }*> [#uses=3] %3 = alloca { { i8*, i8* }*, i8* } ; <{ { i8*, i8* }*, i8* }*> [#uses=3] br label %start start: ; preds = %entry %4 = getelementptr { i32, { { i8*, i8* }*, i8* } }* %2, i32 0, i32 0 ; <i32*> [#uses=1] store i32 %1, i32* %4 %5 = getelementptr { i32, { { i8*, i8* }*, i8* } }* %2, i32 0, i32 1 ; <{ { i8*, i8* }*, i8* }*> [#uses=1] store { { i8*, i8* }*, i8* } %0, { { i8*, i8* }*, i8* }* %5 %6 = getelementptr { i32, { { i8*, i8* }*, i8* } }* %2, i32 0 ; <{ i32, { { i8*, i8* }*, i8* } }*> [#uses=1] %7 = load { i32, { { i8*, i8* }*, i8* } }* %6 ; <{ i32, { { i8 *, i8* }*, i8* } }> [#uses=1] %8 = malloc { i32, { { i8*, i8* }*, i8* } } ; <{ i32, { { i8 *, i8* }*, i8* } }*> [#uses=2] %9 = getelementptr { i32, { { i8*, i8* }*, i8* } }* %8, i32 0 ; <{ i32, { { i8*, i8* }*, i8* } }*> [#uses=1] store { i32, { { i8*, i8* }*, i8* } } %7, { i32, { { i8*, i8* }*, i8* } }* %9 %10 = getelementptr { { i8*, i8* }*, i8* }* %3, i32 0, i32 0 ; <{ i8*, i8* }**> [#uses=1] store { i8*, i8* }* @Cons, { i8*, i8* }** %10 %11 = bitcast { i32, { { i8*, i8* }*, i8* } }* %8 to i8* ; <i8*> [#uses=1] %12 = getelementptr { { i8*, i8* }*, i8* }* %3, i32 0, i32 1 ; <i8**> [#uses=1] store i8* %11, i8** %12 %13 = getelementptr { { i8*, i8* }*, i8* }* %3, i32 0 ; <{ { i 8*, i8* }*, i8* }*> [#uses=1] %14 = load { { i8*, i8* }*, i8* }* %13 ; <{ { i8*, i8* }*, i8* }> [#uses=2] %15 = icmp eq i32 %1, 0 ; <i1> [#uses=1] br i1 %15, label %pass, label %fail fail: ; preds = %start %16 = sub i32 %1, 1 ; <i32> [#uses=1] %17 = tail call fastcc { { i8*, i8* }*, i8* } @init({ { i8*, i8* }*, i8* } %14, i32 %16) ; <{ { i8*, i8* }*, i8* }> [#uses=1] ret { { i8*, i8* }*, i8* } %17 pass: ; preds = %start ret { { i8*, i8* }*, i8* } %14 } Am I going mad or should that tail call three lines up not be leaking stack space? The only possible explanation I can think of is that LLVM believes it cannot make this a tail call because it thinks it is passing a pointer to a struct that is local to the caller. Is that correct and, if so, how can i work around it? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
On Sunday 22 February 2009 14:17:34 Jon Harrop wrote:> define fastcc { { i8*, i8* }*, i8* } @init({ { i8*, i8* }*, i8* }, i32) {I just noticed that I am accidentally returning a struct rather than going via a pointer in the first argument. Might that be related? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
Hi Jon,> I have written a variety tests of tail calls for my HLVM and all passed with > flying colors until I wrote this test (which is actually for algebraic > datatypes) and discovered that it segfaults after ~100k iterations through > what I think should be a tail call. Here's the IR:is this really a tail call? I didn't look closely but at a glance it seems to be passing a local stack variable as a call parameter. Ciao, Duncan.
On Sunday 22 February 2009 20:36:52 Duncan Sands wrote:> Hi Jon, > > > I have written a variety tests of tail calls for my HLVM and all passed > > with flying colors until I wrote this test (which is actually for > > algebraic datatypes) and discovered that it segfaults after ~100k > > iterations through what I think should be a tail call. Here's the IR: > > is this really a tail call?>From what I have understood of the LLVM docs about when tail calls geteliminated on x86 and x64 it should be a tail call, yes. http://llvm.org/docs/CodeGenerator.html#tailcallopt . Caller and callee have the calling convention fastcc. . The call is a tail call - in tail position (ret immediately follows call and ret uses value of call or is void). . Option -tailcallopt is enabled. . No variable argument lists are used. . On x86-64 when generating GOT/PIC code only module-local calls (visibility = hidden or protected) are supported. Those are all satisfied.> I didn't look closely but at a glance it seems to be passing a local stack > variable as a call parameter.In this case, the arguments are a { { i8*, i8* }*, i8* } and a i32. As I understand it, first-class structs are simply unpacked for argument passing so that is equivalent to passing { i8*, i8* }* and i8* and i32. In this case, the first is a pointer to a global variable and the second is a pointer to a malloc'd block. So I don't see why any of the arguments should be inhibiting tail call elimination. I just tested my theory that returning a first-class struct from a function inhibits tail call elimination and it seems that I was correct: altering this function to pass its return struct by pointer in the first argument fixes the stack overflow. Is this a bug in LLVM? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e