I was able to run through all the C/C++ benchmarks in SPEC using LLVM. I'm on OS X 10.3.3. I did a quick comparison between LLVM (latest from CVS as of 4/27) and gcc 3.3 (Apple's build 20030304). For simplicity's sake, the only flag I used was -O3 for each compiler and I was using the C backend to generate native code for PPC. Most of the LLVM results were close to gcc performance (within 5%), but a few of the tests caught my eye. 164.gzip ran about 25% slower on my system using LLVM versus gcc. As you said, source level debugging information wasn't available for the LLVM binary but from looking at a profile of the code, there are two functions that take up a moderate amount of time (zip and file_read) in the LLVM binary but these functions are not in the profile of the gcc code. Is it likely that gcc would have inlined these? file_read is relatively small, but zip is a little bigger. I tried to test this theory by manually editing the gzip code to inline those two functions, eg inline int zip( ... inline int file_read ( .. but when I profiled that new code, it still had those two functions in the profile. Does LLVM support inlining (or am I am idiot and tried to do it manually wrong)? Patrick On May 2, 2004, at 10:40 PM, Chris Lattner wrote:> On Sun, 2 May 2004, Patrick Flanagan wrote: >> Is there anything special flagwise that I would need to specify to >> tell >> it to include symbol and debug information? I've tried specifying -g >> but this information still doesn't seem to be included. A quick copy >> of >> the build of one of the tests to make sure I've got the flags right: > > Nope. Right now LLVM doesn't have real support for source-level > debugging. There is a debugger *started*, but it needs substantial > work > before it can be usable, and the C front-end cannot produce debug > information yet. If you're interested in the debugger, it is discussed > here: > http://llvm.cs.uiuc.edu/docs/SourceLevelDebugging.html > > Sorry! > > -Chris > > -- > http://llvm.cs.uiuc.edu/ > http://www.nondot.org/~sabre/Projects/ > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev >
On Tue, 4 May 2004, Patrick Flanagan wrote:> I was able to run through all the C/C++ benchmarks in SPEC using LLVM. > I'm on OS X 10.3.3. I did a quick comparison between LLVM (latest from > CVS as of 4/27) and gcc 3.3 (Apple's build 20030304). For simplicity's > sake, the only flag I used was -O3 for each compiler and I was using > the C backend to generate native code for PPC.Okay, sounds great. Are you using the -native-cbe option? Or are you running llc -march=c ... and GCC manually?> Most of the LLVM results were close to gcc performance (within 5%), but > a few of the tests caught my eye. 164.gzip ran about 25% slower on my > system using LLVM versus gcc.Hrm, I really want to figure this out!> As you said, source level debugging information wasn't available for the > LLVM binary but from looking at a profile of the code, there are two > functions that take up a moderate amount of time (zip and file_read) in > the LLVM binary but these functions are not in the profile of the gcc > code. Is it likely that gcc would have inlined these?It's quite possible. The best way to check is to look at the .s file produced by GCC and see if they are there. Note that GCC is much more aggressive abount inlining than LLVM is.> file_read is relatively small, but zip is a little bigger. I tried to > test this theory by manually editing the gzip code to inline those two > functions, eg > > inline int zip( ... > inline int file_read ( .. > > but when I profiled that new code, it still had those two functions in > the profile. Does LLVM support inlining (or am I am idiot and tried to > do it manually wrong)?LLVM supports inlining, and you're not an idiot. :) The problem is that LLVM doesn't "listen" to "inline" hints at all right now. If you would like to adjust the inlining thresholds, you can pass -Wa,-inline-threshold=XXX or -Wl,-inline-threshold=XXX to set the compile-time or link-time inlining thresholds, respectively. These both default to 200 (which has no units), if you increase it, the inliner will inline more. If you want to see what inlining decisions are being made, pass -debug-only=inline (with -Wa, or -Wl,) to see what "choices" the inliner is making. Note that, even without source-level debugging information, you can still do performance investigation with LLVM. You can either look at the C code generated by the CBE (which will hurt your eyes: brace yourself), or you can look at the LLVM code directly, which will be easier to handle (once you get used to reading LLVM). I suspect that a large reason that LLVM does worst than a native C compiler with the CBE+GCC is that LLVM generates very low-level C code, and I'm not convinced that GCC is doing a very good job (ie, without syntactic loops). Please let me know what you find! -Chris> On May 2, 2004, at 10:40 PM, Chris Lattner wrote: > > > On Sun, 2 May 2004, Patrick Flanagan wrote: > >> Is there anything special flagwise that I would need to specify to > >> tell > >> it to include symbol and debug information? I've tried specifying -g > >> but this information still doesn't seem to be included. A quick copy > >> of > >> the build of one of the tests to make sure I've got the flags right: > > > > Nope. Right now LLVM doesn't have real support for source-level > > debugging. There is a debugger *started*, but it needs substantial > > work > > before it can be usable, and the C front-end cannot produce debug > > information yet. If you're interested in the debugger, it is discussed > > here: > > http://llvm.cs.uiuc.edu/docs/SourceLevelDebugging.html > > > > Sorry! > > > > -Chris > > > > -- > > http://llvm.cs.uiuc.edu/ > > http://www.nondot.org/~sabre/Projects/ > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev >-Chris -- http://llvm.cs.uiuc.edu/ http://www.nondot.org/~sabre/Projects/
On Tue, 4 May 2004, Chris Lattner wrote:> I suspect that a large reason that LLVM does worst than a native C > compiler with the CBE+GCC is that LLVM generates very low-level C code, > and I'm not convinced that GCC is doing a very good job (ie, without > syntactic loops).Yup, this is EXACTLY what is going on. I took this very simple C function: int Array[1000]; void test(int X) { int i; for (i = 0; i < 1000; ++i) Array[i] += X; } Compile with -O3 on OS/X gave me this: _test: mflr r5 bcl 20,31,"L00000000001$pb" "L00000000001$pb": mflr r2 mtlr r5 addis r4,r2,ha16(L_Array$non_lazy_ptr-"L00000000001$pb") li r2,0 lwz r9,lo16(L_Array$non_lazy_ptr-"L00000000001$pb")(r4) li r4,1000 mtctr r4 L9: lwzx r7,r2,r9 ; load add r6,r7,r3 ; add stwx r6,r2,r9 ; store addi r2,r2,4 ; Increment pointer bdnz L9 ; Decrement count register, branch while not zero blr This is nice code, good GCC. :) Okay, LLVM currently generates this code from the CBE: void test(int l7_X) { unsigned l8_indvar; unsigned l8_indvar__PHI_TEMPORARY; int *l14_tmp_2E_5; int l7_tmp_2E_9; unsigned l8_indvar_2E_next; l8_indvar__PHI_TEMPORARY = 0u; /* for PHI node */ l13_no_exit: l8_indvar = l8_indvar__PHI_TEMPORARY; l14_tmp_2E_5 = &Array[l8_indvar]; l7_tmp_2E_9 = *l14_tmp_2E_5; *l14_tmp_2E_5 = (l7_tmp_2E_9 + l7_X); l8_indvar_2E_next = l8_indvar + 1u; if (!(l8_indvar_2E_next == 1000u)) { l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */ goto l13_no_exit; } return; } This has exactly the same operations in the loop, so GCC should produce the same code, right? Wrong: _test: mflr r4 bcl 20,31,"L00000000001$pb" "L00000000001$pb": mflr r2 mtlr r4 li r11,0 addis r10,r2,ha16(_Array-"L00000000001$pb") L2: slwi r2,r11,2 ; Shift left "i" by 2 la r5,lo16(_Array-"L00000000001$pb")(r10) cmpwi cr0,r11,999 ; compare i to the trip count lwzx r7,r2,r5 ; Load from array addi r11,r11,1 ; increment "i" add r6,r7,r3 ; Add value to array value stwx r6,r2,r5 ; store into array bne+ cr0,L2 ; Loop until done blr Hrm, basically gcc is not doing ANY loop optimization (e.g. strength reduction or "do-loop" optimization) what-so-ever. I'm sure that the X86 GCC is suffering from the same problems, it's just that X86 doesn't depend on strength reduction and do-loop optimization as much, so it's not so pronounced. Interestingly, if I tweak the .cbe code to be this: do { l8_indvar = l8_indvar__PHI_TEMPORARY; l14_tmp_2E_5 = &Array[l8_indvar]; l7_tmp_2E_9 = *l14_tmp_2E_5; *l14_tmp_2E_5 = (l7_tmp_2E_9 + l7_X); l8_indvar_2E_next = l8_indvar + 1u; l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */ } while (!(l8_indvar_2E_next == 1000u)); GCC generates the nice code again, virtually identical to the code from the original source. AAAH! :) Maybe this is a good argument for making the CBE generate syntactic loops in simple cases. I may have some time to try implementing this on the weekend. That is, if no one beats me to it. :) -Chris -- http://llvm.cs.uiuc.edu/ http://www.nondot.org/~sabre/Projects/