On Oct 2, 2007, at 2:27 AM, Arnold Schwaighofer wrote:> Hi all, > > I changed the code that checks whether a tail call is really > eligible for optimization so that it performs the check/fix in > SelectionDAGISel.cpp:BuildSelectionDAG() as suggest by Evan. Also > eliminated an error that caused the remaining failing test cases in > the test-suite. > > The results look very nice (on darwin x86, r42486). > The same number (46) of failing test cases on patched version and > vanilla version. LCCBETA was enabled on both. I changed the LCCBETA > option in Makefile.programs in the patched version to include the > tail-call-opt flags so that the optimization code gets exercised.Ok. I'll review the patch soon. Thanks.> > > ifeq ($(ARCH),x86) > LLCBETAOPTION := -regalloc=local -fast -tail-call-opt -tail-call-opt- > align-stackPlease remove -regalloc=local -fast. We want to test this patch separately. Can you explain the advantages / disadvantages of -tail- call-opt-align-stack?> > > On 26 Sep 2007, at 02:26, Chris Lattner wrote: >> I think enabling this as llcbeta for a few nights makes >> sense before turning it on by default. >> >> -Chris > > What does turning on by default mean? Does that mean it is shown in > llc -help or that it is performed every time (without requesting via > the command line) when compiling?The later.> > > I would not do the latter since tail call optimization clears the > stack frame of the caller (sequence), possibly causing confusion > when debugging also resulting code is probably not faster (since > arguments are lowered to regular stack slot -where they would be if > the call wasRunning it as llcbeta will hopefully help us identify the performance issues so we can fix them before it's turned on by default.> a normal call - first and then moved to the real stack slot - onto > the callers parameters. As noted in the source file this might be > optimized in many cases by looking whether the actual parameters > come from the caller function parameters and would be overwritten if > they are not moved. In cases where they would not be overwritten > they could be directly lowered to the real stack slot).Please add the potential optimizations (with concrete examples / assembly snippets) to README.txt under Target/X86. Thanks! Evan> > > regards arnold > > > <tailcall-r42525- > src.patch>_______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > ifeq ($(ARCH),x86) > > LLCBETAOPTION := -regalloc=local -fast -tail-call-opt -tail-call-opt- > > align-stackokay i ll do another round of testing with LLCBETAOPTION := -tail-call-opt -tail-call-opt-align-stack> Please remove -regalloc=local -fast. We want to test this patch > separately. Can you explain the advantages / disadvantages of -tail- > call-opt-align-stack?When you do tail call optimization just before calling/jmping to the callee you sometimes have to adjust the stack pointer. e.g int calller(int arg1, int arg2) { return callee(arg2, arg1, 2*arg3; } conceptually before jumping to callee the stackpointer has to be adjusted (by -4 bytes in the example). Now this can cause the stack to be misaligned (according to the target abi. In order to prevent this when calculating the argument size (and when tail-call-opt-align-stack is enabled) i make the resulting argument size a multiple of the target alignment (minus the return address slot). So in the example above the argument stack slot size would be 12bytes for both functions.(on darwin-x86 which requires the stack to be 16byte aligned - which i found out by mistake - because dynamically linked function calls would not work ;) this results in a stack adjustment that is a multiple of the target stack alignment. Maybe the default should be to perform this stack alignment and turn it off with a switch (tail-call-opt-disable-stack-alignment) when required? which one would do if he knew that he never calls to dynamically linked functions on darwin for example or only from a top level function (one that is not called from others eg. main). why didn't i made the switch tail-call-opt-disable-stack-alignment in the first place? hmm stupidity comes to ones mind ;) no i had the llvm as a backend for functional languages in mind which might not require the stack to be aligned and only call printf form a top level function. in short: the disadvantage of -tail-call-opt-align-stack is that the function stack frame is bigger. the advantage is that it won't blow up if for example the darwin linker requires functions to be 16 byte aligned :)> > What does turning on by default mean? Does that mean it is shown in > > llc -help or that it is performed every time (without requesting via > > the command line) when compiling? > > The later. > > > > > > > I would not do the latter since tail call optimization clears the > > stack frame of the caller (sequence), possibly causing confusion > > when debugging also resulting code is probably not faster (since > > arguments are lowered to regular stack slot -where they would be if > > the call was > > Running it as llcbeta will hopefully help us identify the performance > issues so we can fix them before it's turned on by default.Okay.> Please add the potential optimizations (with concrete examples / > assembly snippets) to README.txt under Target/X86. Thanks!will do. regards arnold
On 4 Oct 2007, at 00:22, Evan Cheng wrote:>> ifeq ($(ARCH),x86) >> LLCBETAOPTION := -regalloc=local -fast -tail-call-opt -tail-call-opt- >> align-stack > Please remove -regalloc=local -fast. We want to test this patch > separately.just did a test with LLCBETAOPTION := -tail-call-opt -tail-call-opt-align-stack this time only SPASS llc-beta fails (comparing with vanilla version) also jit fails (exit 139) but in both versions (vanilla/patched) which it did not do when i last tested but looking very briefly at the results i am not sure llc-beta really fails since the programm (SPASS) runs through and only the diff fails as you can see below the tail call optimized version needs more clauses to do the prove but eventually end up with the same result: tail-call-opt: .... Given clause: 29875[3:Res:23.2,29765.0] || member(U,singleton (universal_class))* -> . SPASS V 2.1 SPASS beiseite: Proof found. Problem: /Users/arnold/Desktop/testing/tc-test/llvm/projects/llvm- test/MultiSource/Applications/SPASS/problem.dfg SPASS derived 39592 clauses, backtracked 10752 clauses and kept 23124 clauses. SPASS allocated 3905 KBytes. SPASS spent No Timing on this machine. on the problem. No Timing on this machine. for the input. No Timing on this machine. for the FLOTTER CNF translation. No Timing on this machine. for inferences. No Timing on this machine. for the backtracking. No Timing on this machine. for the reduction. --------------------------SPASS-STOP------------------------------ exit 0 vanilla: .... Given clause: 29876[3:Res:23.2,29766.0] || member(U,singleton (universal_class))* -> . SPASS V 2.1 SPASS beiseite: Proof found. Problem: /Users/arnold/Desktop/testing/tc-test/llvm/projects/llvm- test/MultiSource/Applications/SPASS/problem.dfg SPASS derived 39572 clauses, backtracked 10747 clauses and kept 23119 clauses. SPASS allocated 3885 KBytes. SPASS spent No Timing on this machine. on the problem. No Timing on this machine. for the input. No Timing on this machine. for the FLOTTER CNF translation. No Timing on this machine. for inferences. No Timing on this machine. for the backtracking. No Timing on this machine. for the reduction. --------------------------SPASS-STOP------------------------------ exit 0 an arbitrary example taken from > diff SPASS.out-llc SPASS.out-llc-beta 3870,3872c3870,3872 < Given clause: 29104[0:Res:357.1,166.0] || equal (universal_class,singleton_relation) -> member(y,element_relation)*. < Given clause: 29115[0:Res:361.1,166.0] || equal (universal_class,singleton_relation) -> member (universal_class,element_relation)*. < Given clause: 29213[3:SpR:29211.0,5371.0] || -> subclass (universal_class,complement(intersection(singleton (universal_class),universal_class)))*. --- > Given clause: 29103[0:Res:357.1,166.0] || equal (universal_class,singleton_relation) -> member(y,element_relation)*. > Given clause: 29114[0:Res:361.1,166.0] || equal (universal_class,singleton_relation) -> member (universal_class,element_relation)*. > Given clause: 29212[3:SpR:29210.0,5371.0] || -> subclass (universal_class,complement(intersection(singleton (universal_class),universal_class)))*. very weird indeed ;) regards arnold
On Oct 4, 2007, at 5:29 AM, Arnold Schwaighofer wrote:> > in short: > the disadvantage of -tail-call-opt-align-stack is that the function > stack frame is bigger. > the advantage is that it won't blow up if for example the darwin > linker requires functions to be 16 byte aligned :)Thank you for the explanation. I'd much rather not have this separate option. IMHO we should just make this be controlled by -tail-call-opt for now. Evan> > >>> What does turning on by default mean? Does that mean it is shown in >>> llc -help or that it is performed every time (without requesting via >>> the command line) when compiling? >> >> The later. >> >>> >>> >>> I would not do the latter since tail call optimization clears the >>> stack frame of the caller (sequence), possibly causing confusion >>> when debugging also resulting code is probably not faster (since >>> arguments are lowered to regular stack slot -where they would be if >>> the call was >> >> Running it as llcbeta will hopefully help us identify the performance >> issues so we can fix them before it's turned on by default. > > Okay. > >> Please add the potential optimizations (with concrete examples / >> assembly snippets) to README.txt under Target/X86. Thanks! > will do. > > regards arnold > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Oct 4, 2007, at 2:31 PM, Arnold Schwaighofer wrote:> > On 4 Oct 2007, at 00:22, Evan Cheng wrote: >>> ifeq ($(ARCH),x86) >>> LLCBETAOPTION := -regalloc=local -fast -tail-call-opt -tail-call- >>> opt- >>> align-stack >> Please remove -regalloc=local -fast. We want to test this patch >> separately. > > just did a test with > LLCBETAOPTION := -tail-call-opt -tail-call-opt-align-stack > > this time only SPASS llc-beta fails (comparing with vanilla version) > also jit fails (exit 139) but in both versions (vanilla/patched) > which it did not do when i last testedYou mean SPASS JIT fails without tail call optimization being turned on? Is this due to fastcc abi change (callee pops arguments)?> > > but looking very briefly at the results i am not sure llc-beta really > fails since the programm (SPASS) runs through and only the diff failsIf the output mismatches then it has failed. This will not stop the patch from being accepted but it will have to be fixed eventually. Evan> > > as you can see below the tail call optimized version needs more > clauses to do the prove but eventually end up with the same result: > > tail-call-opt: > .... > Given clause: 29875[3:Res:23.2,29765.0] || member(U,singleton > (universal_class))* -> . > SPASS V 2.1 > SPASS beiseite: Proof found. > Problem: /Users/arnold/Desktop/testing/tc-test/llvm/projects/llvm- > test/MultiSource/Applications/SPASS/problem.dfg > SPASS derived 39592 clauses, backtracked 10752 clauses and kept 23124 > clauses. > SPASS allocated 3905 KBytes. > SPASS spent No Timing on this machine. on the problem. > No Timing on this machine. for the input. > No Timing on this machine. for the FLOTTER CNF > translation. > No Timing on this machine. for inferences. > No Timing on this machine. for the backtracking. > No Timing on this machine. for the reduction. > > --------------------------SPASS-STOP------------------------------ > exit 0 > > vanilla: > .... > Given clause: 29876[3:Res:23.2,29766.0] || member(U,singleton > (universal_class))* -> . > SPASS V 2.1 > SPASS beiseite: Proof found. > Problem: /Users/arnold/Desktop/testing/tc-test/llvm/projects/llvm- > test/MultiSource/Applications/SPASS/problem.dfg > SPASS derived 39572 clauses, backtracked 10747 clauses and kept 23119 > clauses. > SPASS allocated 3885 KBytes. > SPASS spent No Timing on this machine. on the problem. > No Timing on this machine. for the input. > No Timing on this machine. for the FLOTTER CNF > translation. > No Timing on this machine. for inferences. > No Timing on this machine. for the backtracking. > No Timing on this machine. for the reduction. > > --------------------------SPASS-STOP------------------------------ > exit 0 > > > an arbitrary example taken from >> diff SPASS.out-llc SPASS.out-llc-beta > 3870,3872c3870,3872 > < Given clause: 29104[0:Res:357.1,166.0] || equal > (universal_class,singleton_relation) -> member(y,element_relation)*. > < Given clause: 29115[0:Res:361.1,166.0] || equal > (universal_class,singleton_relation) -> member > (universal_class,element_relation)*. > < Given clause: 29213[3:SpR:29211.0,5371.0] || -> subclass > (universal_class,complement(intersection(singleton > (universal_class),universal_class)))*. > --- >> Given clause: 29103[0:Res:357.1,166.0] || equal > (universal_class,singleton_relation) -> member(y,element_relation)*. >> Given clause: 29114[0:Res:361.1,166.0] || equal > (universal_class,singleton_relation) -> member > (universal_class,element_relation)*. >> Given clause: 29212[3:SpR:29210.0,5371.0] || -> subclass > (universal_class,complement(intersection(singleton > (universal_class),universal_class)))*. > > very weird indeed ;) > > regards arnold > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev