On Friday 13 November 2009 16:26:01 Chris Lattner wrote:> On Nov 13, 2009, at 3:34 AM, Jon Harrop wrote:
> >> Point 4 is the one that caused me trouble for some time.
> >> Unfortunately
> >> it causes a bad interaction with the optimiser, specifically the
> >> 'simplifycfg' pass. What seems to happen is that since the
function
> >> you are calling is marked with 'noreturn', the simplifycfg
pass will
> >> then put a 'unreachable' statement after the tail call to
that
> >> function. This stuffs up the tail call though as the llc compiler
> >> requires tail calls be followed by a 'return void'
statement. In my
> >> case this caused my compiled programs to segfault if the llvm
> >> optimiser was run on them.
> >
> > Isn't that a bug in LLVM?
>
> Yes, please file a bugzilla with a small example.
I think I may be having a similar problem, albeit with with noalias rather
than noreturn. This is my tail recursive function:
define fastcc noalias i8* @fold_aux(%2*, i32, i8* (%2*, %2, double)*, %2, %0)
{
entry:
%5 = load i32* @shadow_stack_depth ; <i32> [#uses=4]
%6 = alloca %2, align 8 ; <%2*> [#uses=2]
%7 = load %0* @shadow_stack ; <%0> [#uses=1]
%8 = extractvalue %0 %7, 2 ; <i8*> [#uses=1]
%9 = bitcast i8* %8 to %0* ; <%0*> [#uses=1]
%10 = sext i32 %5 to i64 ; <i64> [#uses=1]
%11 = getelementptr %0* %9, i64 %10 ; <%0*> [#uses=1]
store %0 %4, %0* %11
%12 = add i32 %5, 1 ; <i32> [#uses=1]
store i32 %12, i32* @shadow_stack_depth
%13 = load i32* @n_allocated ; <i32> [#uses=1]
%14 = load i32* @quota ; <i32> [#uses=1]
%15 = icmp sgt i32 %13, %14 ; <i1> [#uses=1]
br i1 %15, label %fail, label %cont
fail: ; preds = %entry
%16 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]*
@buf16, i64 0, i64 0)) ; <i32> [#uses=0]
%17 = load double ()** @hlvm_time ; <double ()*> [#uses=1]
%18 = call double %17() ; <double> [#uses=1]
%19 = call fastcc i8* @gc_mark(i32 0) ; <i8*> [#uses=0]
%20 = load double* @mark_time ; <double> [#uses=1]
%21 = load double ()** @hlvm_time ; <double ()*> [#uses=1]
%22 = call double %21() ; <double> [#uses=1]
%23 = fadd double %20, %22 ; <double> [#uses=1]
%24 = fsub double %23, %18 ; <double> [#uses=1]
store double %24, double* @mark_time
%25 = load double ()** @hlvm_time ; <double ()*> [#uses=1]
%26 = call double %25() ; <double> [#uses=1]
%27 = call fastcc i8* @gc_sweep(i32 0) ; <i8*> [#uses=0]
%28 = load double* @sweep_time ; <double> [#uses=1]
%29 = load double ()** @hlvm_time ; <double ()*> [#uses=1]
%30 = call double %29() ; <double> [#uses=1]
%31 = fadd double %28, %30 ; <double> [#uses=1]
%32 = fsub double %31, %26 ; <double> [#uses=1]
store double %32, double* @sweep_time
%33 = load i32* @n_allocated ; <i32> [#uses=1]
%34 = shl i32 %33, 2 ; <i32> [#uses=1]
%35 = add i32 %34, 256 ; <i32> [#uses=1]
store i32 %35, i32* @quota
br label %cont
cont: ; preds = %fail, %entry
%36 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([10 x i8]*
@buf17, i64 0, i64 0)) ; <i32> [#uses=0]
%37 = extractvalue %0 %4, 1 ; <i32> [#uses=1]
%38 = icmp sgt i32 %37, %1 ; <i1> [#uses=1]
br i1 %38, label %pass2, label %fail1
fail1: ; preds = %cont
store i32 %5, i32* @shadow_stack_depth
store %2 %3, %2* %0
ret i8* undef
pass2: ; preds = %cont
%39 = add i32 %1, 1 ; <i32> [#uses=1]
%40 = icmp slt i32 %1, 0 ; <i1> [#uses=1]
br i1 %40, label %fail6, label %pass4
pass4: ; preds = %pass2
%41 = extractvalue %0 %4, 1 ; <i32> [#uses=1]
%42 = icmp sgt i32 %41, %1 ; <i1> [#uses=1]
br i1 %42, label %cont8, label %fail6
fail6: ; preds = %pass4, %pass2
%43 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([27 x i8]*
@buf5, i64 0, i64 0)) ; <i32> [#uses=0]
call void @exit(i32 1)
br label %cont8
cont8: ; preds = %fail6, %pass4
%44 = extractvalue %0 %4, 2 ; <i8*> [#uses=1]
%45 = bitcast i8* %44 to double* ; <double*> [#uses=1]
%46 = sext i32 %1 to i64 ; <i64> [#uses=1]
%47 = getelementptr double* %45, i64 %46 ; <double*> [#uses=1]
%48 = load double* %47 ; <double> [#uses=1]
%49 = call fastcc i8* %2(%2* %6, %2 %3, double %48) ; <i8*> [#uses=0]
%50 = load %2* %6 ; <%2> [#uses=1]
store i32 %5, i32* @shadow_stack_depth
%51 = tail call fastcc i8* @fold_aux(%2* %0, i32 %39, i8* (%2*, %2,
double)* %2, %2 %50, %0 %4) ; <i8*> [#uses=1]
ret i8* %51
}
That last call is the critical tail call and AFAICT it is stack overflowing.
Anyone seen any problems like this before?
I'll try removing the noalias and seeing if that fixes it. If so, I'll
file a
bug report. Which part of LLVM would this be a bug in if things like noalias
and noreturn inhibit TCO when they shouldn't?
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e