Maarten ter Huurne
2007-Sep-21 16:42 UTC
[LLVMdev] Compiling zlib to static bytecode archive
Hi, I'm trying to compile zlib to produce a "libz.a" static library which is an LLVM bytecode archive. I'm using this command line for "configure": AR="llvm-ar r" RANLIB=llvm-ranlib CC=llvm-gcc CFLAGS=--emit-llvm \ ./configure The creation of "libz.a" works, but after that, zlib's Makefile wants to compile and link some example programs. The linking step fails: llvm-gcc --emit-llvm -DNO_vsnprintf -DUSE_MMAP -o example example.o \ -L. libz.a example.o: file not recognized: File format not recognized collect2: ld returned 1 exit status I can link it by hand using llvm-ld instead of llvm-gcc, like this: llvm-ld -o example example.o libz.a However, it is not possible to let the zlib Makefile issue that command without patching the Makefile, because the fragment that does the linking is hardcoded to use the compiler command for linking: example$(EXE): example.o $(LIBS) $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS) Would it be possible to make llvm-gcc call llvm-ld instead of the systemwide ld? I tried setting the environment variables COMPILER_PATH=/usr/local/bin and GCC_EXEC_PREFIX=llvm- but that had no effect. I'm using LLVM 2.1-pre1 and the corresponding llvm-gcc 4.0. Bye, Maarten -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070921/cc5af5ec/attachment.sig>
On Sep 21, 2007, at 9:42 AM, Maarten ter Huurne wrote:> However, it is not possible to let the zlib Makefile issue that > command > without patching the Makefile, because the fragment that does the > linking is > hardcoded to use the compiler command for linking: > > example$(EXE): example.o $(LIBS) > $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS)Right, unfortunately the current Link Time Optimization model requires the linker to "know" about LLVM. http://llvm.org/docs/LinkTimeOptimization.html> Would it be possible to make llvm-gcc call llvm-ld instead of the > systemwide > ld? I tried setting the environment variables COMPILER_PATH=/usr/ > local/bin > and GCC_EXEC_PREFIX=llvm- but that had no effect.I see two solutions to this. One is to have llvm-gcc call llvm-ld when it has some option passed to it. Another would be to enhance 'collect2' to know about LLVM files. 'collect2' is a GCC utility invoked at link time, it would be the perfect place to add hooks. The thing we're missing most right now is a volunteer to tackle this project :) -Chris
Maarten ter Huurne
2007-Sep-23 10:27 UTC
[LLVMdev] Compiling zlib to static bytecode archive
On Friday 21 September 2007, Chris Lattner wrote:> On Sep 21, 2007, at 9:42 AM, Maarten ter Huurne wrote: > > However, it is not possible to let the zlib Makefile issue that > > command > > without patching the Makefile, because the fragment that does the > > linking is > > hardcoded to use the compiler command for linking: > > > > example$(EXE): example.o $(LIBS) > > $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS) > > Right, unfortunately the current Link Time Optimization model > requires the linker to "know" about LLVM. > http://llvm.org/docs/LinkTimeOptimization.htmlThat's the reason I want to try and build a bytecode lib: to see if link time optimization of executable + libs has any effect on performance and on code size. My guess is that performance won't improve much, since there aren't that many calls per second which cross the app-lib boundary. But code size could improve if unused optional features can be elimated as dead code because a function is only called in one particular way. By the way, the example from that document does not work with the current llvm-gcc (GCC 4.0, LLVM 2.1-pre1). The last command fails: $ llvm-gcc a.o main.o -o main a.o: file not recognized: File format not recognized collect2: ld returned 1 exit status Linking with llvm-ld does work: $ llvm-ld a.o main.o -native -o main $ ./main $ echo $? 42 The link step combines one or more input files into one output file. The input files can be all bytecode, all native or mixed. The output file can be bytecode or native. Since it is only possible to convert from bytecode to native and not vice versa, bytecode output requires all bytecode input. So the combinations are: bytecode input, bytecode output: Can be handled by llvm-ld without invoking system compiler/linker. native input, native output: Handled by system compiler/linker. bytecode or mixed input, native output: According to the llvm-ld man page, llvm-ld will generate native code from the bytecode files and invoke the system compiler to do the actual linking.> > Would it be possible to make llvm-gcc call llvm-ld instead of the > > systemwide > > ld? I tried setting the environment variables COMPILER_PATH=/usr/ > > local/bin > > and GCC_EXEC_PREFIX=llvm- but that had no effect. > > I see two solutions to this. One is to have llvm-gcc call llvm-ld > when it has some option passed to it. Another would be to enhance > 'collect2' to know about LLVM files. 'collect2' is a GCC utility > invoked at link time, it would be the perfect place to add hooks.I found the documentation of collect2 here: http://gcc.gnu.org/onlinedocs/gccint/Collect2.html Its purpose seems to be to act like ld and insert calls to initialization routines (and exit routines) before calling the real ld. The comment at the top of the source file describes it like this: Collect static initialization info into data structures that can be traversed by C++ initialization and finalization routines. According to this comment in the collect2 source, having collect2 accept options that ld does not accept will cause trouble: /* !!! When GCC calls collect2, it does not know whether it is calling collect2 or ld. So collect2 cannot meaningfully understand any options except those ld understands. If you propose to make GCC pass some other option, just imagine what will happen if ld is really ld!!! */ Originally I was under the impression that llvm-ld was just an LLVM-aware version of ld, but that is not the case. For example, when creating an output file in native format, it runs the system compiler on the generated native code and that compiler automatically picks up libraries such as libc, which must be specified explicitly to ld. Also, although llvm-ld accepts many of the options accepted by ld, GCC uses some ld options that llvm-ld does not accept. Going back to the two options you mentioned, they would lead to the following invocation chains. Let's use the "mixed input, native output" scenario: if we can support that, we can support the rest as well. llvm-gcc calling llvm-ld: llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld enhance collect2: llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2 -> ld llvm-collect2 is the enhanced collect2, while plain collect2 is the one that belongs to the system compiler. Note that this assumes the system compiler is GCC, otherwise the "gcc -> collect2 -> ld" chain will be something else, but will perform the same function. Since llvm-ld invokes the system compiler to do the actual linking, the executable it produces will already have the proper init/exit sequences. So llvm-collect2 would not have anything to do. To summarize: - llvm-ld (currently) does not accept all flags that GCC passes to collect2 - an LLVM-aware collect2 would never perform the core function of collect2, which is generating init/exit code and data Therefore, I think the scenario of llvm-gcc calling llvm-ld directly is preferable.> The thing we're missing most right now is a volunteer to tackle this > project :)Since this is all new terrain for me, I might get stuck before producing anything useful. But I'm willing to try. Bye, Maarten -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070923/4a28e0be/attachment.sig>