On 13 Jan 2010, at 16:43, Nick Lewycky wrote:> Mark Muir wrote: >> - Run the existing Clang tool on each source file, using -emit-llvm to generate a .bc file for each module. >> - Run llvm-link to merge them into a single .bc file. >> - Run llc to generate a complete machine assembly. >> >> However, with optimisations enabled, the resulting code is not as efficient as it would be if all the code were in a single module. In particular, function inlining is only performed by clang (i.e. only on a module-by-module basis), and not by llvm-link or llc. > > It sounds like you're not running the LTO optimizations. You could try replacing llvm-link with llvm-ld which will, or run 'opt -std-link-opts' between llvm-link and llc. >Yep, that sorted inlining. Thanks. But... now there's a small problem with library calls. Symbols such as 'memset', 'malloc', etc. are being removed by global dead code elimination. They are implemented in one of the bitcode modules that are linked together (implementations are based on newlib). I get the same behaviour of them being stripped even when they are live, by the following: opt -internalize -globaldce Other (not standard-library) functions implemented in different modules than where they are called, are correctly seen as live. So, could this be something to do with what is declared as a built-in? I haven't provided any list of built-ins (or overridden the defaults), nor could I figure out how exactly to do that. I've also noticed other problems related to built-ins - in one example, code made use of abs(), but didn't #include <stdlib.h>. The resulting code compiled without warning or error, but the resulting code was broken, due to the arguments not being seen as live, e.g.: Without #include <stdlib.h>: 0x181e8b0: i32 = TargetGlobalAddress <i32 (...)* @abs> 0 [TF=1] => JUMP_CALLi <ga:abs>[TF=1], %r2<imp-def>, %r3<imp-def>, %r4<imp-def,dead>, %r5<imp-def,dead>, %r6<imp-def,dead>, %r7<imp-def,dead>, %r8<imp-def,dead>, %r9<imp-def,dead>, %r10<imp-def,dead> With #include <stdlib.h>: 0x181e8b0: i32 = TargetGlobalAddress <i32 (i32)* @abs> 0 [TF=1] => JUMP_CALLi <ga:abs>[TF=1], %r3<kill>, %r2<imp-def>, %r3<imp-def>, %r4<imp-def,dead>, %r5<imp-def,dead>, %r6<imp-def,dead>, %r7<imp-def,dead>, %r8<imp-def,dead>, %r9<imp-def,dead>, %r10<imp-def,dead> Where r2 is the link register, and r3 to r10 are argument/retval registers. LowerFormalArguments() doesn't see any arguments in the former, and consequently doesn't add input register nodes to the DAG. I guess I need help with the concept of built-ins, and what code is related to them in the Clang driver and back-end. Regards, - Mark
On 13 January 2010 12:05, Mark Muir <mark.i.r.muir at gmail.com> wrote:> On 13 Jan 2010, at 16:43, Nick Lewycky wrote: > > > Mark Muir wrote: > >> - Run the existing Clang tool on each source file, using -emit-llvm to > generate a .bc file for each module. > >> - Run llvm-link to merge them into a single .bc file. > >> - Run llc to generate a complete machine assembly. > >> > >> However, with optimisations enabled, the resulting code is not as > efficient as it would be if all the code were in a single module. In > particular, function inlining is only performed by clang (i.e. only on a > module-by-module basis), and not by llvm-link or llc. > > > > It sounds like you're not running the LTO optimizations. You could try > replacing llvm-link with llvm-ld which will, or run 'opt -std-link-opts' > between llvm-link and llc. > > > > Yep, that sorted inlining. Thanks. > > But... now there's a small problem with library calls. Symbols such as > 'memset', 'malloc', etc. are being removed by global dead code elimination. > They are implemented in one of the bitcode modules that are linked together > (implementations are based on newlib).And what problems does that cause? If malloc is linked in, we're free to inline it everywhere and delete the symbol. If you meant for it to be visible to the optimizers but you don't want it to be part of the code generated for your program (ie., you'll link it against newlib later), you should mark the functions with available_externally linkage.> I get the same behaviour of them being stripped even when they are live, by > the following: > > opt -internalize -globaldce > > Other (not standard-library) functions implemented in different modules > than where they are called, are correctly seen as live. So, could this be > something to do with what is declared as a built-in? I haven't provided any > list of built-ins (or overridden the defaults), nor could I figure out how > exactly to do that. >Alternately, if you wanted malloc, memset and friends to be externally visible (compiled as part of your program and dlsym'able), you could create a public api file which contains a one per line list of the names of the functions that may not be marked internal linkage by internalize. Pass that in to opt with -internalize-public-api-file filename ...other flags... Nick I've also noticed other problems related to built-ins - in one example, code> made use of abs(), but didn't #include <stdlib.h>. The resulting code > compiled without warning or error, but the resulting code was broken, due to > the arguments not being seen as live, e.g.: > > Without #include <stdlib.h>: > > 0x181e8b0: i32 = TargetGlobalAddress <i32 (...)* @abs> 0 [TF=1] > => JUMP_CALLi <ga:abs>[TF=1], %r2<imp-def>, %r3<imp-def>, > %r4<imp-def,dead>, %r5<imp-def,dead>, %r6<imp-def,dead>, %r7<imp-def,dead>, > %r8<imp-def,dead>, %r9<imp-def,dead>, %r10<imp-def,dead> > > With #include <stdlib.h>: > > 0x181e8b0: i32 = TargetGlobalAddress <i32 (i32)* @abs> 0 [TF=1] > => JUMP_CALLi <ga:abs>[TF=1], %r3<kill>, %r2<imp-def>, %r3<imp-def>, > %r4<imp-def,dead>, %r5<imp-def,dead>, %r6<imp-def,dead>, %r7<imp-def,dead>, > %r8<imp-def,dead>, %r9<imp-def,dead>, %r10<imp-def,dead> > > Where r2 is the link register, and r3 to r10 are argument/retval registers. > LowerFormalArguments() doesn't see any arguments in the former, and > consequently doesn't add input register nodes to the DAG. > > I guess I need help with the concept of built-ins, and what code is related > to them in the Clang driver and back-end. > > Regards, > > - Mark > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100113/685a6b84/attachment.html>
On 13 Jan 2010, at 20:34, Nick Lewycky wrote:> On 13 January 2010 12:05, Mark Muir <mark.i.r.muir at gmail.com> wrote: > > But... now there's a small problem with library calls. Symbols such as 'memset', 'malloc', etc. are being removed by global dead code elimination. They are implemented in one of the bitcode modules that are linked together (implementations are based on newlib). > > And what problems does that cause? If malloc is linked in, we're free to inline it everywhere and delete the symbol. If you meant for it to be visible to the optimizers but you don't want it to be part of the code generated for your program (ie., you'll link it against newlib later), you should mark the functions with available_externally linkage. >Sorry, I should've been more clear - the calls to _malloc and _free weren't being inlined (see example below). I'm not sure why (happens with or without -simplify-libcalls). So, the resulting .bc file from 'opt' contains live references to symbols that were in its input .bc, but for some reason it stripped them. #include <stdlib.h> int entries = 3; int result; int main() { int i; // Allocate and populate the initial array. int* values = malloc(entries * sizeof(int)); for (i = 0; i < entries; i ++) values[i] = i + 1; // Calculate the sum, using a dynamically allocated accumulator. int* acc = malloc(sizeof(int)); *acc = 0; for (i = 0; i < entries; i ++) *acc += values[i]; result = *acc; // Deallocate the memory. free(values); free(acc); return 0; } Here's a fragment of the final machine assembly (with -O3): _main: ADDCOMP out=r1 in1=r1 in2=4 conf=`ADDCOMP_SUB WMEM in=r2 in_addr=r1 conf=`WMEM_SI CONST_16B out=r3 conf=12 JUMP nl_out=r2/*RA*/ addr_in=&_malloc conf=`JUMP_ALWAYS_ABS // Call In case this is important, here is the relevant declarations from the 'stdlib.h' that is in use: _PTR _EXFUN(malloc,(size_t __size)); _VOID _EXFUN(free,(_PTR)); where: #define _PTR void * #define _EXFUN(name, proto) name proto and from 'newlib.c': void * malloc (size_t sz) { ... } i.e. They look like any other function call, which is why I suspect it has something to do with special behaviour given to built-ins.> > Alternately, if you wanted malloc, memset and friends to be externally visible (compiled as part of your program and dlsym'able), you could create a public api file which contains a one per line list of the names of the functions that may not be marked internal linkage by internalize. Pass that in to opt with -internalize-public-api-file filename ...other flags... >I saw that. I was thinking of only using that option as a last resort, due to maintainability.> > I guess I need help with the concept of built-ins, and what code is related to them in the Clang driver and back-end.Thanks. - Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100113/1305cf18/attachment.html>