On Tue, 4 May 2004, Chris Lattner wrote:> I suspect that a large reason that LLVM does worst than a native C > compiler with the CBE+GCC is that LLVM generates very low-level C code, > and I'm not convinced that GCC is doing a very good job (ie, without > syntactic loops).Yup, this is EXACTLY what is going on. I took this very simple C function: int Array[1000]; void test(int X) { int i; for (i = 0; i < 1000; ++i) Array[i] += X; } Compile with -O3 on OS/X gave me this: _test: mflr r5 bcl 20,31,"L00000000001$pb" "L00000000001$pb": mflr r2 mtlr r5 addis r4,r2,ha16(L_Array$non_lazy_ptr-"L00000000001$pb") li r2,0 lwz r9,lo16(L_Array$non_lazy_ptr-"L00000000001$pb")(r4) li r4,1000 mtctr r4 L9: lwzx r7,r2,r9 ; load add r6,r7,r3 ; add stwx r6,r2,r9 ; store addi r2,r2,4 ; Increment pointer bdnz L9 ; Decrement count register, branch while not zero blr This is nice code, good GCC. :) Okay, LLVM currently generates this code from the CBE: void test(int l7_X) { unsigned l8_indvar; unsigned l8_indvar__PHI_TEMPORARY; int *l14_tmp_2E_5; int l7_tmp_2E_9; unsigned l8_indvar_2E_next; l8_indvar__PHI_TEMPORARY = 0u; /* for PHI node */ l13_no_exit: l8_indvar = l8_indvar__PHI_TEMPORARY; l14_tmp_2E_5 = &Array[l8_indvar]; l7_tmp_2E_9 = *l14_tmp_2E_5; *l14_tmp_2E_5 = (l7_tmp_2E_9 + l7_X); l8_indvar_2E_next = l8_indvar + 1u; if (!(l8_indvar_2E_next == 1000u)) { l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */ goto l13_no_exit; } return; } This has exactly the same operations in the loop, so GCC should produce the same code, right? Wrong: _test: mflr r4 bcl 20,31,"L00000000001$pb" "L00000000001$pb": mflr r2 mtlr r4 li r11,0 addis r10,r2,ha16(_Array-"L00000000001$pb") L2: slwi r2,r11,2 ; Shift left "i" by 2 la r5,lo16(_Array-"L00000000001$pb")(r10) cmpwi cr0,r11,999 ; compare i to the trip count lwzx r7,r2,r5 ; Load from array addi r11,r11,1 ; increment "i" add r6,r7,r3 ; Add value to array value stwx r6,r2,r5 ; store into array bne+ cr0,L2 ; Loop until done blr Hrm, basically gcc is not doing ANY loop optimization (e.g. strength reduction or "do-loop" optimization) what-so-ever. I'm sure that the X86 GCC is suffering from the same problems, it's just that X86 doesn't depend on strength reduction and do-loop optimization as much, so it's not so pronounced. Interestingly, if I tweak the .cbe code to be this: do { l8_indvar = l8_indvar__PHI_TEMPORARY; l14_tmp_2E_5 = &Array[l8_indvar]; l7_tmp_2E_9 = *l14_tmp_2E_5; *l14_tmp_2E_5 = (l7_tmp_2E_9 + l7_X); l8_indvar_2E_next = l8_indvar + 1u; l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */ } while (!(l8_indvar_2E_next == 1000u)); GCC generates the nice code again, virtually identical to the code from the original source. AAAH! :) Maybe this is a good argument for making the CBE generate syntactic loops in simple cases. I may have some time to try implementing this on the weekend. That is, if no one beats me to it. :) -Chris -- http://llvm.cs.uiuc.edu/ http://www.nondot.org/~sabre/Projects/
On May 4, 2004, at 10:36 PM, Chris Lattner wrote:> On Tue, 4 May 2004, Chris Lattner wrote: >> I suspect that a large reason that LLVM does worst than a native C >> compiler with the CBE+GCC is that LLVM generates very low-level C >> code, >> and I'm not convinced that GCC is doing a very good job (ie, without >> syntactic loops). > > Yup, this is EXACTLY what is going on.Interesting. Now that you mention it, I do recall thinking the loops that llvm generated looked a bit different than the gcc loops. I'll go back and take another look, but this might explain some of that discrepancy. In response to the other email: I'm using the -native-cbe option to generate the code. From your last email, it sounds like when you specify this option, it compiles everything to the llvm code, then the CBE generates C code based on the generated llvm code, and THAT is what is compiled to native code rather than the original code itself? That was one of the other things I was going to get around to asking about eventually, it seems like llvm takes an eternity and a half to link these programs, but if this "linking" is really llvm code -> c code -> compile & link again, that would explain why it takes so long. I have to confess I'm not as familiar with gcc as I'd like to be. Where would gcc put the .s file (or what flags do I have to specify to create one?) Also, where would llvm put the llvm code and the C code that the backend generates (or what do I need to specify to tell it to keep that around)? I took a look at the inlining decisions that llvm prints out when you specify -Wl,-debug-only=inline. Let me make sure I understand how to interpret this output correctly: Inliner visiting SCC: .gen_codes_26 Inspecting function: .gen_codes_26 Inlining: cost=100, Call: %tmp.37 = call uint %bi_reverse( uint %tmp.42, int %tmp.27 ) ; <uint> [#uses=1] Inliner visiting SCC: .pqdownheap_35 Inspecting function: .pqdownheap_35 Inliner visiting SCC: .build_tree_41 Inspecting function: .build_tree_41 NOT Inlining: cost=501, Call: call void %.pqdownheap_35( %struct.ct_data* %tmp.2, int %n.1.0 ) NOT Inlining: cost=466, Call: call void %.pqdownheap_35( %struct.ct_data* %tmp.2, int 1 ) NOT Inlining: cost=466, Call: call void %.pqdownheap_35( %struct.ct_data* %tmp.2, int 1 ) NOT Inlining: cost=406, Call: call void %.gen_codes_26( %struct.ct_data* %tmp.2, int %max_code.1.0 ) So it looks at each function to try to determine if it should be inlined, comes up with a "cost" to inline it based on what it takes as parameters and how often its called, and if this cost is below the threshold specified then it inlines it? What about this build_tree function from the log? It says multiple times its not inlined. Is the decisions whether to inline it or not made for the function as a whole (eg always inline or always don't) or is it decided on a call by call basis? What does it mean for something like pqdownheap where it doesn't give a cost with a yea or nay? Also, on an unrelated note, I could have sworn all those benchmarks compiled but I went back to double check and I saw that there were a few problems. 253.perlbmk builds fine but crashes when running through the spec test. I recall seeing a note a few places on the website that said perlbmk didn't work properly due to a longjmp bug, is that still a known bug or should I try running that through a debugger to find the problem? 176.gcc generates an ICE when trying to compile with llvm and -O3. Here's the build log, are there any other files that might shed more light on this problem? Patrick We will use: 176.gcc Compiling Binaries Building 176.gcc ref base ppc32_llvm default specmake clean 2> make.err | tee make.out rm -rf cc1 cc1.exe *.o core *.err *.out specmake build 2> make.err | tee make.out /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-parse.o -DHOST_WORDS_BIG_ENDIAN -O3 c-parse.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-lang.o -DHOST_WORDS_BIG_ENDIAN -O3 c-lang.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-lex.o -DHOST_WORDS_BIG_ENDIAN -O3 c-lex.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-pragma.o -DHOST_WORDS_BIG_ENDIAN -O3 c-pragma.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-decl.o -DHOST_WORDS_BIG_ENDIAN -O3 c-decl.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-typeck.o -DHOST_WORDS_BIG_ENDIAN -O3 c-typeck.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-convert.o -DHOST_WORDS_BIG_ENDIAN -O3 c-convert.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-aux-info.o -DHOST_WORDS_BIG_ENDIAN -O3 c-aux-info.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-common.o -DHOST_WORDS_BIG_ENDIAN -O3 c-common.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-iterate.o -DHOST_WORDS_BIG_ENDIAN -O3 c-iterate.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o toplev.o -DHOST_WORDS_BIG_ENDIAN -O3 toplev.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o version.o -DHOST_WORDS_BIG_ENDIAN -O3 version.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o tree.o -DHOST_WORDS_BIG_ENDIAN -O3 tree.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o print-tree.o -DHOST_WORDS_BIG_ENDIAN -O3 print-tree.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o stor-layout.o -DHOST_WORDS_BIG_ENDIAN -O3 stor-layout.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o fold-const.o -DHOST_WORDS_BIG_ENDIAN -O3 fold-const.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o function.o -DHOST_WORDS_BIG_ENDIAN -O3 function.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o stmt.o -DHOST_WORDS_BIG_ENDIAN -O3 stmt.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o expr.o -DHOST_WORDS_BIG_ENDIAN -O3 expr.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o calls.o -DHOST_WORDS_BIG_ENDIAN -O3 calls.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o expmed.o -DHOST_WORDS_BIG_ENDIAN -O3 expmed.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o explow.o -DHOST_WORDS_BIG_ENDIAN -O3 explow.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o optabs.o -DHOST_WORDS_BIG_ENDIAN -O3 optabs.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o varasm.o -DHOST_WORDS_BIG_ENDIAN -O3 varasm.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o rtl.o -DHOST_WORDS_BIG_ENDIAN -O3 rtl.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o print-rtl.o -DHOST_WORDS_BIG_ENDIAN -O3 print-rtl.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o rtlanal.o -DHOST_WORDS_BIG_ENDIAN -O3 rtlanal.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o emit-rtl.o -DHOST_WORDS_BIG_ENDIAN -O3 emit-rtl.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o real.o -DHOST_WORDS_BIG_ENDIAN -O3 real.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o dbxout.o -DHOST_WORDS_BIG_ENDIAN -O3 dbxout.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o sdbout.o -DHOST_WORDS_BIG_ENDIAN -O3 sdbout.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o dwarfout.o -DHOST_WORDS_BIG_ENDIAN -O3 dwarfout.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o xcoffout.o -DHOST_WORDS_BIG_ENDIAN -O3 xcoffout.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o integrate.o -DHOST_WORDS_BIG_ENDIAN -O3 integrate.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o jump.o -DHOST_WORDS_BIG_ENDIAN -O3 jump.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o cse.o -DHOST_WORDS_BIG_ENDIAN -O3 cse.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o loop.o -DHOST_WORDS_BIG_ENDIAN -O3 loop.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o unroll.o -DHOST_WORDS_BIG_ENDIAN -O3 unroll.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o flow.o -DHOST_WORDS_BIG_ENDIAN -O3 flow.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o stupid.o -DHOST_WORDS_BIG_ENDIAN -O3 stupid.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o combine.o -DHOST_WORDS_BIG_ENDIAN -O3 combine.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o regclass.o -DHOST_WORDS_BIG_ENDIAN -O3 regclass.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o local-alloc.o -DHOST_WORDS_BIG_ENDIAN -O3 local-alloc.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o global.o -DHOST_WORDS_BIG_ENDIAN -O3 global.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o reload.o -DHOST_WORDS_BIG_ENDIAN -O3 reload.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o reload1.o -DHOST_WORDS_BIG_ENDIAN -O3 reload1.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o caller-save.o -DHOST_WORDS_BIG_ENDIAN -O3 caller-save.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-peep.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-peep.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o reorg.o -DHOST_WORDS_BIG_ENDIAN -O3 reorg.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o sched.o -DHOST_WORDS_BIG_ENDIAN -O3 sched.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o final.o -DHOST_WORDS_BIG_ENDIAN -O3 final.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o recog.o -DHOST_WORDS_BIG_ENDIAN -O3 recog.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o reg-stack.o -DHOST_WORDS_BIG_ENDIAN -O3 reg-stack.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-opinit.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-opinit.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-recog.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-recog.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-extract.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-extract.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-output.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-output.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-emit.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-emit.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-attrtab.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-attrtab.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o m88k.o -DHOST_WORDS_BIG_ENDIAN -O3 m88k.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o getpwd.o -DHOST_WORDS_BIG_ENDIAN -O3 getpwd.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o convert.o -DHOST_WORDS_BIG_ENDIAN -O3 convert.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o bc-emit.o -DHOST_WORDS_BIG_ENDIAN -O3 bc-emit.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o bc-optab.o -DHOST_WORDS_BIG_ENDIAN -O3 bc-optab.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o obstack.o -DHOST_WORDS_BIG_ENDIAN -O3 obstack.c /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -Wl,-native-cbe -O3 c-parse.o c-lang.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o c-aux-info.o c-common.o c-iterate.o toplev.o version.o tree.o print-tree.o stor-layout.o fold-const.o function.o stmt.o expr.o calls.o expmed.o explow.o optabs.o varasm.o rtl.o print-rtl.o rtlanal.o emit-rtl.o real.o dbxout.o sdbout.o dwarfout.o xcoffout.o integrate.o jump.o cse.o loop.o unroll.o flow.o stupid.o combine.o regclass.o local-alloc.o global.o reload.o reload1.o caller-save.o insn-peep.o reorg.o sched.o final.o recog.o reg-stack.o insn-opinit.o insn-recog.o insn-extract.o insn-output.o insn-emit.o insn-attrtab.o m88k.o getpwd.o convert.o bc-emit.o bc-optab.o obstack.o -lm -o cc1 WARNING: While resolving call to function '.plain_type_6' arguments were dropped! WARNING: While resolving call to function '.plain_type_6' arguments were dropped! WARNING: While resolving call to function '.plain_type_6' arguments were dropped! combine.c: In function `find_split_point': combine.c:2443: warning: function returns address of local variable combine.c:2509: warning: function returns address of local variable combine.c:2683: warning: function returns address of local variable combine.c:2695: warning: function returns address of local variable combine.c:2747: warning: function returns address of local variable WARNING: Type conflict between types named 'struct.var_refs_queue'. Src=' %struct.var_refs_queue'. Dest=' %struct.var_refs_queue' WARNING: Type conflict between types named 'struct.sequence_stack'. Src=' %struct.sequence_stack'. Dest=' %struct.sequence_stack' WARNING: Type conflict between types named 'struct.function'. Src=' %struct.function'. Dest=' %struct.function' WARNING: Type conflict between types named 'struct.var_refs_queue'. Src=' %struct.var_refs_queue'. Dest=' %struct.var_refs_queue' WARNING: Type conflict between types named 'struct.sequence_stack'. Src=' %struct.sequence_stack'. Dest=' %struct.sequence_stack' WARNING: Type conflict between types named 'struct.function'. Src=' %struct.function'. Dest=' %struct.function' WARNING: Type conflict between types named 'struct.var_refs_queue'. Src=' %struct.var_refs_queue'. Dest=' %struct.var_refs_queue' WARNING: Type conflict between types named 'struct.sequence_stack'. Src=' %struct.sequence_stack'. Dest=' %struct.sequence_stack' WARNING: Type conflict between types named 'struct.function'. Src=' %struct.function'. Dest=' %struct.function' WARNING: Type conflict between types named 'struct.var_refs_queue'. Src=' %struct.var_refs_queue'. Dest=' %struct.var_refs_queue' WARNING: Type conflict between types named 'struct.sequence_stack'. Src=' %struct.sequence_stack'. Dest=' %struct.sequence_stack' WARNING: Type conflict between types named 'struct.function'. Src=' %struct.function'. Dest=' %struct.function' WARNING: Type conflict between types named 'struct.var_refs_queue'. Src=' %struct.var_refs_queue'. Dest=' %struct.var_refs_queue' WARNING: Type conflict between types named 'struct.sequence_stack'. Src=' %struct.sequence_stack'. Dest=' %struct.sequence_stack' WARNING: Type conflict between types named 'struct.function'. Src=' %struct.function'. Dest=' %struct.function' WARNING: Found global types that are not compatible: %struct.rtx_def* (%struct.increment_operator*, %union.tree_node*)* %bc_expand_increment void (%struct.increment_operator*, %union.tree_node*)* %bc_expand_increment WARNING: Found global types that are not compatible: int (...)* %bc_xstrdup sbyte* (sbyte*)* %bc_xstrdup WARNING: Found global types that are not compatible: void (...)* %dump_flow_info void (%struct.__sFILE*)* %dump_flow_info int (...)* %dump_flow_info WARNING: Found global types that are not compatible: int (...)* %expand_expr %struct.rtx_def* (...)* %expand_expr %struct.rtx_def* (%union.tree_node*, %struct.rtx_def*, uint, uint)* %expand_expr WARNING: Found global types that are not compatible: %struct.function* (%union.tree_node*)* %find_function_data { \2, sbyte*, %union.tree_node*, int, int, int, int, int, int, int, int, int, int, %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, int, int, %struct.rtx_def*, int, int, int, %struct.rtx_def**, int, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, int, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, %struct.rtx_def*, %union.tree_node*, %union.tree_node*, int, %struct.temp_slot*, int, { %struct.rtx_def*, uint, int, \2 }*, %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, int, int, %union.tree_node*, %struct.rtx_def*, int, sbyte*, int, %struct.goto_fixup*, int, int, %union.tree_node*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, int, int, %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, { %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, \2 }*, int, int, sbyte*, sbyte*, int, %struct.rtx_def**, %union.tree_node*, %union.tree_node*, %union.tree_node*, %union.tree_node*, %union.tree_node*, int, int, %struct.momentary_level*, sbyte*, sbyte*, sbyte*, sbyte*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.simple_obstack_stack*, int, int, %struct.machine_function*, %struct.rtx_def*, %struct.constant_descriptor**, %struct.pool_sym**, %struct.pool_constant*, %struct.pool_constant*, int }* (%union.tree_node*)* %find_function_data WARNING: Found global types that are not compatible: %struct.function** %outer_function_chain { \2, sbyte*, %union.tree_node*, int, int, int, int, int, int, int, int, int, int, %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, int, int, %struct.rtx_def*, int, int, int, %struct.rtx_def**, int, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, int, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, %struct.rtx_def*, %union.tree_node*, %union.tree_node*, int, %struct.temp_slot*, int, { %struct.rtx_def*, uint, int, \2 }*, %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, int, int, %union.tree_node*, %struct.rtx_def*, int, sbyte*, int, %struct.goto_fixup*, int, int, %union.tree_node*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, int, int, %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, { %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, \2 }*, int, int, sbyte*, sbyte*, int, %struct.rtx_def**, %union.tree_node*, %union.tree_node*, %union.tree_node*, %union.tree_node*, %union.tree_node*, int, int, %struct.momentary_level*, sbyte*, sbyte*, sbyte*, sbyte*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.simple_obstack_stack*, int, int, %struct.machine_function*, %struct.rtx_def*, %struct.constant_descriptor**, %struct.pool_sym**, %struct.pool_constant*, %struct.pool_constant*, int }** %outer_function_chain WARNING: While resolving call to function 'gen_call_value' arguments were dropped! WARNING: While resolving call to function 'gen_call' arguments were dropped! WARNING: While resolving call to function 'gen_call_value' arguments were dropped! cc1.cbe.c:1560: warning: conflicting types for built-in function `fprintf' cc1.cbe.c:1640: warning: conflicting types for built-in function `sprintf' cc1.cbe.c:1757: warning: conflicting types for built-in function `strncmp' cc1.cbe.c:1763: warning: conflicting types for built-in function `strchr' cc1.cbe.c:1846: warning: conflicting types for built-in function `memcmp' cc1.cbe.c:2321: warning: conflicting types for built-in function `strrchr' cc1.cbe.c:3048: warning: conflicting types for built-in function `memcpy' cc1.cbe.c:3049: warning: conflicting types for built-in function `memset' cc1.cbe.c: In function `l2493_recog_5': cc1.cbe.c:607726: internal compiler error: in final_scan_insn, at final.c:2189 Please submit a full bug report, with preprocessed source if appropriate. See <URL:http://developer.apple.com/bugreporter> for instructions. specmake options 2> options.err | tee options.out COMP: /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o options.o -DHOST_WORDS_BIG_ENDIAN -O3 LINK: /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -Wl,-native-cbe -O3 -lm -o options Some files did not appear to be built: cc1 *** Error building 176.gcc
On Wed, 5 May 2004, Patrick Flanagan wrote:> >> and I'm not convinced that GCC is doing a very good job (ie, without > >> syntactic loops). > > > > Yup, this is EXACTLY what is going on. > > Interesting. Now that you mention it, I do recall thinking the loops > that llvm generated looked a bit different than the gcc loops. I'll go > back and take another look, but this might explain some of that > discrepancy.I'll try to put together a "solution" for this some time in the near future. Since right now we depend on the CBE for performance, and because so many people use GCC, we really are REQUIRED to cover for this if we want to provide competitive performance. I imagine that this should improve loop-intensive codes substantially.> In response to the other email: > > I'm using the -native-cbe option to generate the code. From your last > email, it sounds like when you specify this option, it compiles > everything to the llvm code, then the CBE generates C code based on the > generated llvm code, and THAT is what is compiled to native code rather > than the original code itself?Yes, that's exactly right.> That was one of the other things I was going to get around to asking > about eventually, it seems like llvm takes an eternity and a half to > link these programs, but if this "linking" is really llvm code -> c code > -> compile & link again, that would explain why it takes so long.Yup, that's what's going on. Until we have a native code generator for PPC, it will have to stay that way unfortunately.> I have to confess I'm not as familiar with gcc as I'd like to be. Where > would gcc put the .s file (or what flags do I have to specify to create > one?)If you're using the -native-cbe option, the .s file is removed, and there isn't a flag to preserve it. The way to do this is to CBE the .bc file (which should be compiled in parallel with the native program), and compile manually with GCC. i.e.: $ llc -march=c foo.bc -o foo.cbe.c $ gcc -O3 foo.cbe.c -S $ less foo.cbe.s> Also, where would llvm put the llvm code and the C code that the > backend generates (or what do I need to specify to tell it to keep that > around)?The only way to get that is to use the commands above on the bytecode file. The LLVM bytecode file should be generated in parallel with the native executable, adding a .bc suffix to it.> I took a look at the inlining decisions that llvm prints out when you > specify -Wl,-debug-only=inline. Let me make sure I understand how to > interpret this output correctly: > > Inliner visiting SCC: .gen_codes_26 > Inspecting function: .gen_codes_26 > Inlining: cost=100, Call: %tmp.37 = call uint %bi_reverse( uint > %tmp.42, int %tmp.27 ) ; <uint> [#uses=1] > Inliner visiting SCC: .pqdownheap_35 > Inspecting function: .pqdownheap_35 > Inliner visiting SCC: .build_tree_41 > Inspecting function: .build_tree_41 > NOT Inlining: cost=501, Call: call void %.pqdownheap_35( > %struct.ct_data* %tmp.2, int %n.1.0 ) > NOT Inlining: cost=466, Call: call void %.pqdownheap_35( > %struct.ct_data* %tmp.2, int 1 ) > NOT Inlining: cost=466, Call: call void %.pqdownheap_35( > %struct.ct_data* %tmp.2, int 1 ) > NOT Inlining: cost=406, Call: call void %.gen_codes_26( > %struct.ct_data* %tmp.2, int %max_code.1.0 ) > > So it looks at each function to try to determine if it should be > inlined, comes up with a "cost" to inline it based on what it takes as > parameters and how often its called, and if this cost is below the > threshold specified then it inlines it?Yes, basically.> What about this build_tree function from the log? It says multiple times > its not inlined. Is the decisions whether to inline it or not made for > the function as a whole (eg always inline or always don't) or is it > decided on a call by call basis? What does it mean for something like > pqdownheap where it doesn't give a cost with a yea or nay?It does this on a call-site by call-site basis. It appears that build_tree is calling pqdownheap 3 times. In two of those, it passes a constant one as the second argument. The inliner is looking into the function and deciding that it could simplify the resultant code a bit because of the constant, thus the call to call pqdownheap for the second 2 calls is less than the cost to inline the first. The "inspecting" and "visiting" lines indicate the function that the inliner is looking at (i.e., it is inspecting calls inside of that function). The inliner works in a "bottom-up" fashion, inlining the leaves of the call graph before it attempts to inliner the roots.> Also, on an unrelated note, I could have sworn all those benchmarks > compiled but I went back to double check and I saw that there were a > few problems.Okay.> 253.perlbmk builds fine but crashes when running through the spec test. > I recall seeing a note a few places on the website that said perlbmk > didn't work properly due to a longjmp bug, is that still a known bug or > should I try running that through a debugger to find the problem?Hrm, that's an interesting question. To get setjmp/longjmp working, you have to pass -enable-correct-eh-support into the 'llc' command. I don't think there is any way to get the -native-cbe option to do this, so if you want to test it, you'll have to use the commands above to compile it by hand. :(> 176.gcc generates an ICE when trying to compile with llvm and -O3. > Here's the build log, are there any other files that might shed more > light on this problem?Hrm, I have no idea. That's a bug in GCC or Apple's modification to GCC. You might try filing a bug with them (providing the .cbe.c file) they might be able to describe a work-around. -Chris> Patrick > > We will use: 176.gcc > Compiling Binaries > Building 176.gcc ref base ppc32_llvm default > specmake clean 2> make.err | tee make.out > rm -rf cc1 cc1.exe *.o core *.err *.out > specmake build 2> make.err | tee make.out > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > c-parse.o -DHOST_WORDS_BIG_ENDIAN -O3 c-parse.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > c-lang.o -DHOST_WORDS_BIG_ENDIAN -O3 c-lang.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > c-lex.o -DHOST_WORDS_BIG_ENDIAN -O3 c-lex.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > c-pragma.o -DHOST_WORDS_BIG_ENDIAN -O3 c-pragma.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > c-decl.o -DHOST_WORDS_BIG_ENDIAN -O3 c-decl.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > c-typeck.o -DHOST_WORDS_BIG_ENDIAN -O3 c-typeck.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > c-convert.o -DHOST_WORDS_BIG_ENDIAN -O3 c-convert.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > c-aux-info.o -DHOST_WORDS_BIG_ENDIAN -O3 c-aux-info.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > c-common.o -DHOST_WORDS_BIG_ENDIAN -O3 c-common.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > c-iterate.o -DHOST_WORDS_BIG_ENDIAN -O3 c-iterate.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > toplev.o -DHOST_WORDS_BIG_ENDIAN -O3 toplev.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > version.o -DHOST_WORDS_BIG_ENDIAN -O3 version.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o tree.o > -DHOST_WORDS_BIG_ENDIAN -O3 tree.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > print-tree.o -DHOST_WORDS_BIG_ENDIAN -O3 print-tree.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > stor-layout.o -DHOST_WORDS_BIG_ENDIAN -O3 stor-layout.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > fold-const.o -DHOST_WORDS_BIG_ENDIAN -O3 fold-const.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > function.o -DHOST_WORDS_BIG_ENDIAN -O3 function.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o stmt.o > -DHOST_WORDS_BIG_ENDIAN -O3 stmt.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o expr.o > -DHOST_WORDS_BIG_ENDIAN -O3 expr.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > calls.o -DHOST_WORDS_BIG_ENDIAN -O3 calls.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > expmed.o -DHOST_WORDS_BIG_ENDIAN -O3 expmed.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > explow.o -DHOST_WORDS_BIG_ENDIAN -O3 explow.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > optabs.o -DHOST_WORDS_BIG_ENDIAN -O3 optabs.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > varasm.o -DHOST_WORDS_BIG_ENDIAN -O3 varasm.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o rtl.o > -DHOST_WORDS_BIG_ENDIAN -O3 rtl.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > print-rtl.o -DHOST_WORDS_BIG_ENDIAN -O3 print-rtl.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > rtlanal.o -DHOST_WORDS_BIG_ENDIAN -O3 rtlanal.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > emit-rtl.o -DHOST_WORDS_BIG_ENDIAN -O3 emit-rtl.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o real.o > -DHOST_WORDS_BIG_ENDIAN -O3 real.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > dbxout.o -DHOST_WORDS_BIG_ENDIAN -O3 dbxout.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > sdbout.o -DHOST_WORDS_BIG_ENDIAN -O3 sdbout.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > dwarfout.o -DHOST_WORDS_BIG_ENDIAN -O3 dwarfout.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > xcoffout.o -DHOST_WORDS_BIG_ENDIAN -O3 xcoffout.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > integrate.o -DHOST_WORDS_BIG_ENDIAN -O3 integrate.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o jump.o > -DHOST_WORDS_BIG_ENDIAN -O3 jump.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o cse.o > -DHOST_WORDS_BIG_ENDIAN -O3 cse.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o loop.o > -DHOST_WORDS_BIG_ENDIAN -O3 loop.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > unroll.o -DHOST_WORDS_BIG_ENDIAN -O3 unroll.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o flow.o > -DHOST_WORDS_BIG_ENDIAN -O3 flow.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > stupid.o -DHOST_WORDS_BIG_ENDIAN -O3 stupid.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > combine.o -DHOST_WORDS_BIG_ENDIAN -O3 combine.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > regclass.o -DHOST_WORDS_BIG_ENDIAN -O3 regclass.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > local-alloc.o -DHOST_WORDS_BIG_ENDIAN -O3 local-alloc.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > global.o -DHOST_WORDS_BIG_ENDIAN -O3 global.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > reload.o -DHOST_WORDS_BIG_ENDIAN -O3 reload.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > reload1.o -DHOST_WORDS_BIG_ENDIAN -O3 reload1.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > caller-save.o -DHOST_WORDS_BIG_ENDIAN -O3 caller-save.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > insn-peep.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-peep.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > reorg.o -DHOST_WORDS_BIG_ENDIAN -O3 reorg.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > sched.o -DHOST_WORDS_BIG_ENDIAN -O3 sched.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > final.o -DHOST_WORDS_BIG_ENDIAN -O3 final.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > recog.o -DHOST_WORDS_BIG_ENDIAN -O3 recog.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > reg-stack.o -DHOST_WORDS_BIG_ENDIAN -O3 reg-stack.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > insn-opinit.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-opinit.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > insn-recog.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-recog.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > insn-extract.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-extract.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > insn-output.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-output.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > insn-emit.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-emit.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > insn-attrtab.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-attrtab.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o m88k.o > -DHOST_WORDS_BIG_ENDIAN -O3 m88k.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > getpwd.o -DHOST_WORDS_BIG_ENDIAN -O3 getpwd.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > convert.o -DHOST_WORDS_BIG_ENDIAN -O3 convert.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > bc-emit.o -DHOST_WORDS_BIG_ENDIAN -O3 bc-emit.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > bc-optab.o -DHOST_WORDS_BIG_ENDIAN -O3 bc-optab.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > obstack.o -DHOST_WORDS_BIG_ENDIAN -O3 obstack.c > /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc > -Wl,-native-cbe -O3 c-parse.o c-lang.o c-lex.o c-pragma.o > c-decl.o c-typeck.o c-convert.o c-aux-info.o c-common.o c-iterate.o > toplev.o version.o tree.o print-tree.o stor-layout.o fold-const.o > function.o stmt.o expr.o calls.o expmed.o explow.o optabs.o varasm.o > rtl.o print-rtl.o rtlanal.o emit-rtl.o real.o dbxout.o sdbout.o > dwarfout.o xcoffout.o integrate.o jump.o cse.o loop.o unroll.o flow.o > stupid.o combine.o regclass.o local-alloc.o global.o reload.o reload1.o > caller-save.o insn-peep.o reorg.o sched.o final.o recog.o reg-stack.o > insn-opinit.o insn-recog.o insn-extract.o insn-output.o insn-emit.o > insn-attrtab.o m88k.o getpwd.o convert.o bc-emit.o bc-optab.o obstack.o > -lm -o cc1 > WARNING: While resolving call to function '.plain_type_6' arguments > were dropped! > WARNING: While resolving call to function '.plain_type_6' arguments > were dropped! > WARNING: While resolving call to function '.plain_type_6' arguments > were dropped! > combine.c: In function `find_split_point': > > combine.c:2443: warning: function returns address of local variable > combine.c:2509: warning: function returns address of local variable > combine.c:2683: warning: function returns address of local variable > combine.c:2695: warning: function returns address of local variable > combine.c:2747: warning: function returns address of local variable > WARNING: Type conflict between types named 'struct.var_refs_queue'. > Src=' %struct.var_refs_queue'. > Dest=' %struct.var_refs_queue' > WARNING: Type conflict between types named 'struct.sequence_stack'. > Src=' %struct.sequence_stack'. > Dest=' %struct.sequence_stack' > WARNING: Type conflict between types named 'struct.function'. > Src=' %struct.function'. > Dest=' %struct.function' > WARNING: Type conflict between types named 'struct.var_refs_queue'. > Src=' %struct.var_refs_queue'. > Dest=' %struct.var_refs_queue' > WARNING: Type conflict between types named 'struct.sequence_stack'. > Src=' %struct.sequence_stack'. > Dest=' %struct.sequence_stack' > WARNING: Type conflict between types named 'struct.function'. > Src=' %struct.function'. > Dest=' %struct.function' > WARNING: Type conflict between types named 'struct.var_refs_queue'. > Src=' %struct.var_refs_queue'. > Dest=' %struct.var_refs_queue' > WARNING: Type conflict between types named 'struct.sequence_stack'. > Src=' %struct.sequence_stack'. > Dest=' %struct.sequence_stack' > WARNING: Type conflict between types named 'struct.function'. > Src=' %struct.function'. > Dest=' %struct.function' > WARNING: Type conflict between types named 'struct.var_refs_queue'. > Src=' %struct.var_refs_queue'. > Dest=' %struct.var_refs_queue' > WARNING: Type conflict between types named 'struct.sequence_stack'. > Src=' %struct.sequence_stack'. > Dest=' %struct.sequence_stack' > WARNING: Type conflict between types named 'struct.function'. > Src=' %struct.function'. > Dest=' %struct.function' > WARNING: Type conflict between types named 'struct.var_refs_queue'. > Src=' %struct.var_refs_queue'. > Dest=' %struct.var_refs_queue' > WARNING: Type conflict between types named 'struct.sequence_stack'. > Src=' %struct.sequence_stack'. > Dest=' %struct.sequence_stack' > WARNING: Type conflict between types named 'struct.function'. > Src=' %struct.function'. > Dest=' %struct.function' > WARNING: Found global types that are not compatible: > %struct.rtx_def* (%struct.increment_operator*, > %union.tree_node*)* %bc_expand_increment > void (%struct.increment_operator*, %union.tree_node*)* > %bc_expand_increment > WARNING: Found global types that are not compatible: > int (...)* %bc_xstrdup > sbyte* (sbyte*)* %bc_xstrdup > WARNING: Found global types that are not compatible: > void (...)* %dump_flow_info > void (%struct.__sFILE*)* %dump_flow_info > int (...)* %dump_flow_info > WARNING: Found global types that are not compatible: > int (...)* %expand_expr > %struct.rtx_def* (...)* %expand_expr > %struct.rtx_def* (%union.tree_node*, %struct.rtx_def*, uint, > uint)* %expand_expr > WARNING: Found global types that are not compatible: > %struct.function* (%union.tree_node*)* %find_function_data > { \2, sbyte*, %union.tree_node*, int, int, int, int, int, int, > int, int, int, int, %struct.rtx_def*, %struct.rtx_def*, > %union.tree_node*, int, int, %struct.rtx_def*, int, int, int, > %struct.rtx_def**, int, %struct.rtx_def*, %struct.rtx_def*, > %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, > int, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, > %struct.rtx_def*, %union.tree_node*, %struct.rtx_def*, > %union.tree_node*, %union.tree_node*, int, %struct.temp_slot*, int, { > %struct.rtx_def*, uint, int, \2 }*, %struct.nesting*, %struct.nesting*, > %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, > int, int, %union.tree_node*, %struct.rtx_def*, int, sbyte*, int, > %struct.goto_fixup*, int, int, %union.tree_node*, %struct.rtx_def*, > %struct.rtx_def*, %struct.rtx_def*, int, int, %struct.rtx_def*, > %struct.rtx_def*, %union.tree_node*, { %struct.rtx_def*, > %struct.rtx_def*, %union.tree_node*, \2 }*, int, int, sbyte*, sbyte*, > int, %struct.rtx_def**, %union.tree_node*, %union.tree_node*, > %union.tree_node*, %union.tree_node*, %union.tree_node*, int, int, > %struct.momentary_level*, sbyte*, sbyte*, sbyte*, sbyte*, > %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, > %struct.obstack*, %struct.obstack*, %struct.simple_obstack_stack*, int, > int, %struct.machine_function*, %struct.rtx_def*, > %struct.constant_descriptor**, %struct.pool_sym**, > %struct.pool_constant*, %struct.pool_constant*, int }* > (%union.tree_node*)* %find_function_data > WARNING: Found global types that are not compatible: > %struct.function** %outer_function_chain > { \2, sbyte*, %union.tree_node*, int, int, int, int, int, int, > int, int, int, int, %struct.rtx_def*, %struct.rtx_def*, > %union.tree_node*, int, int, %struct.rtx_def*, int, int, int, > %struct.rtx_def**, int, %struct.rtx_def*, %struct.rtx_def*, > %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, > int, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, > %struct.rtx_def*, %union.tree_node*, %struct.rtx_def*, > %union.tree_node*, %union.tree_node*, int, %struct.temp_slot*, int, { > %struct.rtx_def*, uint, int, \2 }*, %struct.nesting*, %struct.nesting*, > %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, > int, int, %union.tree_node*, %struct.rtx_def*, int, sbyte*, int, > %struct.goto_fixup*, int, int, %union.tree_node*, %struct.rtx_def*, > %struct.rtx_def*, %struct.rtx_def*, int, int, %struct.rtx_def*, > %struct.rtx_def*, %union.tree_node*, { %struct.rtx_def*, > %struct.rtx_def*, %union.tree_node*, \2 }*, int, int, sbyte*, sbyte*, > int, %struct.rtx_def**, %union.tree_node*, %union.tree_node*, > %union.tree_node*, %union.tree_node*, %union.tree_node*, int, int, > %struct.momentary_level*, sbyte*, sbyte*, sbyte*, sbyte*, > %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, > %struct.obstack*, %struct.obstack*, %struct.simple_obstack_stack*, int, > int, %struct.machine_function*, %struct.rtx_def*, > %struct.constant_descriptor**, %struct.pool_sym**, > %struct.pool_constant*, %struct.pool_constant*, int }** > %outer_function_chain > WARNING: While resolving call to function 'gen_call_value' arguments > were dropped! > WARNING: While resolving call to function 'gen_call' arguments were > dropped! > WARNING: While resolving call to function 'gen_call_value' arguments > were dropped! > cc1.cbe.c:1560: warning: conflicting types for built-in function > `fprintf' > cc1.cbe.c:1640: warning: conflicting types for built-in function > `sprintf' > cc1.cbe.c:1757: warning: conflicting types for built-in function > `strncmp' > cc1.cbe.c:1763: warning: conflicting types for built-in function > `strchr' > cc1.cbe.c:1846: warning: conflicting types for built-in function > `memcmp' > cc1.cbe.c:2321: warning: conflicting types for built-in function > `strrchr' > cc1.cbe.c:3048: warning: conflicting types for built-in function > `memcpy' > cc1.cbe.c:3049: warning: conflicting types for built-in function > `memset' > cc1.cbe.c: In function `l2493_recog_5': > cc1.cbe.c:607726: internal compiler error: in final_scan_insn, at > final.c:2189 > Please submit a full bug report, > with preprocessed source if appropriate. > See <URL:http://developer.apple.com/bugreporter> for instructions. > specmake options 2> options.err | tee options.out > COMP: /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o > options.o -DHOST_WORDS_BIG_ENDIAN -O3 > LINK: /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc > -Wl,-native-cbe -O3 -lm -o options > Some files did not appear to be built: cc1 > *** Error building 176.gcc > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev >-Chris -- http://llvm.cs.uiuc.edu/ http://www.nondot.org/~sabre/Projects/
On Tue, 4 May 2004, Chris Lattner wrote:> On Tue, 4 May 2004, Chris Lattner wrote: > > I suspect that a large reason that LLVM does worst than a native C > > compiler with the CBE+GCC is that LLVM generates very low-level C code, > > and I'm not convinced that GCC is doing a very good job (ie, without > > syntactic loops). > > Yup, this is EXACTLY what is going on. > > I took this very simple C function: > > int Array[1000]; > void test(int X) { > int i; > for (i = 0; i < 1000; ++i) > Array[i] += X; > } > > Compile with -O3 on OS/X gave me this: > > _test: > mflr r5 > bcl 20,31,"L00000000001$pb" > "L00000000001$pb": > mflr r2 > mtlr r5 > addis r4,r2,ha16(L_Array$non_lazy_ptr-"L00000000001$pb") > li r2,0 > lwz r9,lo16(L_Array$non_lazy_ptr-"L00000000001$pb")(r4) > li r4,1000 > mtctr r4 > L9: > lwzx r7,r2,r9 ; load > add r6,r7,r3 ; add > stwx r6,r2,r9 ; store > addi r2,r2,4 ; Increment pointer > bdnz L9 ; Decrement count register, branch while not zero > blr > > This is nice code, good GCC. :)Okay, I changed the C backend to emit syntactic loops around the real loops, and it seems to make a big difference. LLVM now generates this code (note that the actual loop is not actually responsible for control flow, it's unreachable): void test(int l7_X) { unsigned l8_indvar; unsigned l8_indvar__PHI_TEMPORARY; int *l14_tmp_2E_4; int l7_tmp_2E_7; unsigned l8_indvar_2E_next; l8_indvar__PHI_TEMPORARY = 0u; /* for PHI node */ goto l13_no_exit; do { /* Syntactic loop 'no_exit' to make GCC happy */ l13_no_exit: l8_indvar = l8_indvar__PHI_TEMPORARY; l14_tmp_2E_4 = &Array[l8_indvar]; l7_tmp_2E_7 = *l14_tmp_2E_4; *l14_tmp_2E_4 = (l7_tmp_2E_7 + l7_X); l8_indvar_2E_next = l8_indvar + 1u; l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */ if ((l8_indvar_2E_next == 1000u)) { goto l13_return; } else { goto l13_no_exit; } } while (1); /* end of syntactic loop 'no_exit' */ l13_return: return; } Instead of:> void test(int l7_X) { > unsigned l8_indvar; > unsigned l8_indvar__PHI_TEMPORARY; > int *l14_tmp_2E_5; > int l7_tmp_2E_9; > unsigned l8_indvar_2E_next; > > l8_indvar__PHI_TEMPORARY = 0u; /* for PHI node */ > > l13_no_exit: > l8_indvar = l8_indvar__PHI_TEMPORARY; > l14_tmp_2E_5 = &Array[l8_indvar]; > l7_tmp_2E_9 = *l14_tmp_2E_5; > *l14_tmp_2E_5 = (l7_tmp_2E_9 + l7_X); > l8_indvar_2E_next = l8_indvar + 1u; > if (!(l8_indvar_2E_next == 1000u)) { > l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */ > goto l13_no_exit; > } > return; > }The new CBE generated code causes GCC to compile the function into: _test: mflr r5 bcl 20,31,"L00000000001$pb" "L00000000001$pb": mflr r4 mtlr r5 addis r2,r4,ha16(_Array-"L00000000001$pb") li r4,1000 mtctr r4 la r9,lo16(_Array-"L00000000001$pb")(r2) li r2,0 L10: L2: lwzx r7,r2,r9 add r6,r7,r3 stwx r6,r2,r9 addi r2,r2,4 bdnz L10 L7: blr ... which is exactly what we want. Patrick, I would appreciate it if you could rerun your tests on PPC and let me know if this helps. :) -Chris -- http://llvm.cs.uiuc.edu/ http://www.nondot.org/~sabre/Projects/
Sorry for the delayed response, we're having finals and stuff here so I've been pretty distracted studying for those. I will grab the latest from CVS tonight and rerun that and see what the difference is. Patrick On May 9, 2004, at 4:53 PM, Chris Lattner wrote:> On Tue, 4 May 2004, Chris Lattner wrote: > >> On Tue, 4 May 2004, Chris Lattner wrote: >>> I suspect that a large reason that LLVM does worst than a native C >>> compiler with the CBE+GCC is that LLVM generates very low-level C >>> code, >>> and I'm not convinced that GCC is doing a very good job (ie, without >>> syntactic loops). >> >> Yup, this is EXACTLY what is going on. >> >> I took this very simple C function: >> >> int Array[1000]; >> void test(int X) { >> int i; >> for (i = 0; i < 1000; ++i) >> Array[i] += X; >> } >> >> Compile with -O3 on OS/X gave me this: >> >> _test: >> mflr r5 >> bcl 20,31,"L00000000001$pb" >> "L00000000001$pb": >> mflr r2 >> mtlr r5 >> addis r4,r2,ha16(L_Array$non_lazy_ptr-"L00000000001$pb") >> li r2,0 >> lwz r9,lo16(L_Array$non_lazy_ptr-"L00000000001$pb")(r4) >> li r4,1000 >> mtctr r4 >> L9: >> lwzx r7,r2,r9 ; load >> add r6,r7,r3 ; add >> stwx r6,r2,r9 ; store >> addi r2,r2,4 ; Increment pointer >> bdnz L9 ; Decrement count register, branch >> while not zero >> blr >> >> This is nice code, good GCC. :) > > Okay, I changed the C backend to emit syntactic loops around the real > loops, and it seems to make a big difference. LLVM now generates this > code (note that the actual loop is not actually responsible for control > flow, it's unreachable): > > void test(int l7_X) { > unsigned l8_indvar; > unsigned l8_indvar__PHI_TEMPORARY; > int *l14_tmp_2E_4; > int l7_tmp_2E_7; > unsigned l8_indvar_2E_next; > > l8_indvar__PHI_TEMPORARY = 0u; /* for PHI node */ > goto l13_no_exit; > > do { /* Syntactic loop 'no_exit' to make GCC happy */ > l13_no_exit: > l8_indvar = l8_indvar__PHI_TEMPORARY; > l14_tmp_2E_4 = &Array[l8_indvar]; > l7_tmp_2E_7 = *l14_tmp_2E_4; > *l14_tmp_2E_4 = (l7_tmp_2E_7 + l7_X); > l8_indvar_2E_next = l8_indvar + 1u; > l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */ > if ((l8_indvar_2E_next == 1000u)) { > goto l13_return; > } else { > goto l13_no_exit; > } > > } while (1); /* end of syntactic loop 'no_exit' */ > l13_return: > return; > } > > Instead of: > >> void test(int l7_X) { >> unsigned l8_indvar; >> unsigned l8_indvar__PHI_TEMPORARY; >> int *l14_tmp_2E_5; >> int l7_tmp_2E_9; >> unsigned l8_indvar_2E_next; >> >> l8_indvar__PHI_TEMPORARY = 0u; /* for PHI node */ >> >> l13_no_exit: >> l8_indvar = l8_indvar__PHI_TEMPORARY; >> l14_tmp_2E_5 = &Array[l8_indvar]; >> l7_tmp_2E_9 = *l14_tmp_2E_5; >> *l14_tmp_2E_5 = (l7_tmp_2E_9 + l7_X); >> l8_indvar_2E_next = l8_indvar + 1u; >> if (!(l8_indvar_2E_next == 1000u)) { >> l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */ >> goto l13_no_exit; >> } >> return; >> } > > The new CBE generated code causes GCC to compile the function into: > > _test: > mflr r5 > bcl 20,31,"L00000000001$pb" > "L00000000001$pb": > mflr r4 > mtlr r5 > addis r2,r4,ha16(_Array-"L00000000001$pb") > li r4,1000 > mtctr r4 > la r9,lo16(_Array-"L00000000001$pb")(r2) > li r2,0 > L10: > L2: > lwzx r7,r2,r9 > add r6,r7,r3 > stwx r6,r2,r9 > addi r2,r2,4 > bdnz L10 > L7: > blr > > ... which is exactly what we want. Patrick, I would appreciate it if > you > could rerun your tests on PPC and let me know if this helps. :) > > -Chris > > -- > http://llvm.cs.uiuc.edu/ > http://www.nondot.org/~sabre/Projects/ > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev >
> > Okay, I changed the C backend to emit syntactic loops around the real > loops, and it seems to make a big difference. LLVM now generates this > code (note that the actual loop is not actually responsible for control > flow, it's unreachable): > > ... which is exactly what we want. Patrick, I would appreciate it if > you > could rerun your tests on PPC and let me know if this helps. :) > > -Chris >Ok, I ran through the SPEC suite with the new LLVM. Unfortunately, it doesn't seem to have made anything faster. Most of the benchmarks are the same (or within noise) but gzip got even slower. I put together a quick page showing the SPEC scores and their differences between LLVM versions. You can see that here: www.valtrain.com/files/LLVM-compare.html Due to the aforementioned finals, I haven't gotten a chance to do any profiling or try to investigate why the syntactic loop thing made it slower but I'll try to look at that some more this weekend. Patrick
> > Okay, I changed the C backend to emit syntactic loops around the real > loops, and it seems to make a big difference. LLVM now generates this > code (note that the actual loop is not actually responsible for control > flow, it's unreachable): > > ... which is exactly what we want. Patrick, I would appreciate it if > you > could rerun your tests on PPC and let me know if this helps. :) >Aside from this syntactic loop stuff, I was looking over gzip some more and found another area that could be improved. In gzip's longest_match function, part of the code generated by CBE looks like this: l13_shortcirc_next_2E_11: l8_chain_length_2E_039 = (((l2_tmp_2E_182) ? (4294967295u) : (0u))) + l8_chain_length_2E_1; .. some other code ... l13_loopcont_2E_0: ... some other code ... l2_tmp_2E_182 = l8_tmp_2E_180 > l8_mem_tmp_2E_0; if (l2_tmp_2E_182) { goto l13_shortcirc_next_2E_11; } else { goto l13_UnifiedReturnBlock; } Basically it does that check and puts the result in l2_tmp_2E_182, then uses that in the if check and the ternary thing. When this is compiled, the assembly that it generates for that check/assignment is: subc r29, r25, r2 subfe r29,r29,r29 neg r29,r29 This is pretty slow compared to just doing the check on the fly (and being able to just use a compare instruction). If I manually edit the code to change it to: l13_shortcirc_next_2E_11: l8_chain_length_2E_039 = ((l8_tmp_2E_180 > l8_mem_tmp_2E_0) ? (4294967295u) : (0u))) + l8_chain_length_2E_1; .. some other code ... l13_loopcont_2E_0: if (l8_tmp_2E_180 > l8_mem_tmp_2E_0) { goto l13_shortcirc_next_2E_11; } else { goto l13_UnifiedReturnBlock; } then the assembly generated becomes a cmplw and branch where it occurs. Making this change in only this one spot causes the time to run to decrease 69 seconds, giving it a speedup of 6% from the 5/12 LLVM CVS. I noticed several spots in the CBE code where this type of code was generated, and if it was changed to emit code the 2nd way it would presumably help even more. Lastly, did you ever hear anything back from that group that was working on the PPC JIT compiler? :) Thanks, Patrick