Maarten ter Huurne
2007-Sep-23 10:27 UTC
[LLVMdev] Compiling zlib to static bytecode archive
On Friday 21 September 2007, Chris Lattner wrote:> On Sep 21, 2007, at 9:42 AM, Maarten ter Huurne wrote: > > However, it is not possible to let the zlib Makefile issue that > > command > > without patching the Makefile, because the fragment that does the > > linking is > > hardcoded to use the compiler command for linking: > > > > example$(EXE): example.o $(LIBS) > > $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS) > > Right, unfortunately the current Link Time Optimization model > requires the linker to "know" about LLVM. > http://llvm.org/docs/LinkTimeOptimization.htmlThat's the reason I want to try and build a bytecode lib: to see if link time optimization of executable + libs has any effect on performance and on code size. My guess is that performance won't improve much, since there aren't that many calls per second which cross the app-lib boundary. But code size could improve if unused optional features can be elimated as dead code because a function is only called in one particular way. By the way, the example from that document does not work with the current llvm-gcc (GCC 4.0, LLVM 2.1-pre1). The last command fails: $ llvm-gcc a.o main.o -o main a.o: file not recognized: File format not recognized collect2: ld returned 1 exit status Linking with llvm-ld does work: $ llvm-ld a.o main.o -native -o main $ ./main $ echo $? 42 The link step combines one or more input files into one output file. The input files can be all bytecode, all native or mixed. The output file can be bytecode or native. Since it is only possible to convert from bytecode to native and not vice versa, bytecode output requires all bytecode input. So the combinations are: bytecode input, bytecode output: Can be handled by llvm-ld without invoking system compiler/linker. native input, native output: Handled by system compiler/linker. bytecode or mixed input, native output: According to the llvm-ld man page, llvm-ld will generate native code from the bytecode files and invoke the system compiler to do the actual linking.> > Would it be possible to make llvm-gcc call llvm-ld instead of the > > systemwide > > ld? I tried setting the environment variables COMPILER_PATH=/usr/ > > local/bin > > and GCC_EXEC_PREFIX=llvm- but that had no effect. > > I see two solutions to this. One is to have llvm-gcc call llvm-ld > when it has some option passed to it. Another would be to enhance > 'collect2' to know about LLVM files. 'collect2' is a GCC utility > invoked at link time, it would be the perfect place to add hooks.I found the documentation of collect2 here: http://gcc.gnu.org/onlinedocs/gccint/Collect2.html Its purpose seems to be to act like ld and insert calls to initialization routines (and exit routines) before calling the real ld. The comment at the top of the source file describes it like this: Collect static initialization info into data structures that can be traversed by C++ initialization and finalization routines. According to this comment in the collect2 source, having collect2 accept options that ld does not accept will cause trouble: /* !!! When GCC calls collect2, it does not know whether it is calling collect2 or ld. So collect2 cannot meaningfully understand any options except those ld understands. If you propose to make GCC pass some other option, just imagine what will happen if ld is really ld!!! */ Originally I was under the impression that llvm-ld was just an LLVM-aware version of ld, but that is not the case. For example, when creating an output file in native format, it runs the system compiler on the generated native code and that compiler automatically picks up libraries such as libc, which must be specified explicitly to ld. Also, although llvm-ld accepts many of the options accepted by ld, GCC uses some ld options that llvm-ld does not accept. Going back to the two options you mentioned, they would lead to the following invocation chains. Let's use the "mixed input, native output" scenario: if we can support that, we can support the rest as well. llvm-gcc calling llvm-ld: llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld enhance collect2: llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2 -> ld llvm-collect2 is the enhanced collect2, while plain collect2 is the one that belongs to the system compiler. Note that this assumes the system compiler is GCC, otherwise the "gcc -> collect2 -> ld" chain will be something else, but will perform the same function. Since llvm-ld invokes the system compiler to do the actual linking, the executable it produces will already have the proper init/exit sequences. So llvm-collect2 would not have anything to do. To summarize: - llvm-ld (currently) does not accept all flags that GCC passes to collect2 - an LLVM-aware collect2 would never perform the core function of collect2, which is generating init/exit code and data Therefore, I think the scenario of llvm-gcc calling llvm-ld directly is preferable.> The thing we're missing most right now is a volunteer to tackle this > project :)Since this is all new terrain for me, I might get stuck before producing anything useful. But I'm willing to try. Bye, Maarten -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070923/4a28e0be/attachment.sig>
On Sep 23, 2007, at 3:27 AM, Maarten ter Huurne wrote:> On Friday 21 September 2007, Chris Lattner wrote: >> On Sep 21, 2007, at 9:42 AM, Maarten ter Huurne wrote: >>> However, it is not possible to let the zlib Makefile issue that >>> command >>> without patching the Makefile, because the fragment that does the >>> linking is >>> hardcoded to use the compiler command for linking: >>> >>> example$(EXE): example.o $(LIBS) >>> $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS) >> >> Right, unfortunately the current Link Time Optimization model >> requires the linker to "know" about LLVM. >> http://llvm.org/docs/LinkTimeOptimization.html > > That's the reason I want to try and build a bytecode lib: to see if > link time > optimization of executable + libs has any effect on performance and > on code > size.Right.> My guess is that performance won't improve much, since there aren't > that many calls per second which cross the app-lib boundary. But > code size > could improve if unused optional features can be elimated as dead code > because a function is only called in one particular way.make sense!> By the way, the example from that document does not work with the > current > llvm-gcc (GCC 4.0, LLVM 2.1-pre1). The last command fails: > > $ llvm-gcc a.o main.o -o main > a.o: file not recognized: File format not recognized > collect2: ld returned 1 exit statusAgain, this is because your native linker doesn't support liblto.> Linking with llvm-ld does work: > > $ llvm-ld a.o main.o -native -o main > $ ./main > $ echo $? > 42 > > The link step combines one or more input files into one output > file. The input > files can be all bytecode, all native or mixed. The output file can be > bytecode or native. Since it is only possible to convert from > bytecode to > native and not vice versa, bytecode output requires all bytecode > input. So > the combinations are: > > bytecode input, bytecode output: > Can be handled by llvm-ld without invoking system compiler/linker.Yes, but note that this only works if you limit yourself to linker options known by llvm-ld. If you use funky stuff, llvm-ld won't be able to handle it. Also, llvm-ld may or may not handle archive resolution correctly (I don't remember).> native input, native output: > Handled by system compiler/linker. > > bytecode or mixed input, native output: > According to the llvm-ld man page, llvm-ld will generate native > code from the > bytecode files and invoke the system compiler to do the actual > linking.Yes.>>> Would it be possible to make llvm-gcc call llvm-ld instead of the >>> systemwide >>> ld? I tried setting the environment variables COMPILER_PATH=/usr/ >>> local/bin >>> and GCC_EXEC_PREFIX=llvm- but that had no effect. >> >> I see two solutions to this. One is to have llvm-gcc call llvm-ld >> when it has some option passed to it. Another would be to enhance >> 'collect2' to know about LLVM files. 'collect2' is a GCC utility >> invoked at link time, it would be the perfect place to add hooks. > > I found the documentation of collect2 here: > http://gcc.gnu.org/onlinedocs/gccint/Collect2.html > > Its purpose seems to be to act like ld and insert calls to > initialization > routines (and exit routines) before calling the real ld. The > comment at the > top of the source file describes it like this: > > Collect static initialization info into data structures that can be > traversed by C++ initialization and finalization routines.Right, that is its intended purpose. It seems fairly straight forward to abuse it for our devious plans though :)> Originally I was under the impression that llvm-ld was just an LLVM- > aware > version of ld, but that is not the case. For example, when creating > an output > file in native format, it runs the system compiler on the generated > native > code and that compiler automatically picks up libraries such as > libc, which > must be specified explicitly to ld. Also, although llvm-ld accepts > many of > the options accepted by ld, GCC uses some ld options that llvm-ld > does not > accept.Right.> Going back to the two options you mentioned, they would lead to the > following > invocation chains. Let's use the "mixed input, native output" > scenario: if we > can support that, we can support the rest as well. > > llvm-gcc calling llvm-ld: > llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld > > enhance collect2: > llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2 -> ldI'd rather enhance collect2 like this: llvm-gcc -> llvm-collect2(liblto) -> ld Where llvm-collect2 is just collect2 that dlopen's liblto to do the optimization work. This makes it work much more naturally than adding a whole new set of steps. Depending on llvm-ld will never get you to a world where LTO is transparent, because llvm-ld doesn't support a lot of options and features that native linkers do.> To summarize: > - llvm-ld (currently) does not accept all flags that GCC passes to > collect2 > - an LLVM-aware collect2 would never perform the core function of > collect2, > which is generating init/exit code and data > > Therefore, I think the scenario of llvm-gcc calling llvm-ld > directly is > preferable.Ah, but if the llvm-collect2 version was enhanced to do everything it does now, and additionally interface with liblto, then everyone wins :)>> The thing we're missing most right now is a volunteer to tackle this >> project :) > > Since this is all new terrain for me, I might get stuck before > producing > anything useful. But I'm willing to try.Yay! Many people will appreciate this! -Chris
Maarten ter Huurne
2007-Sep-26 01:12 UTC
[LLVMdev] Compiling zlib to static bytecode archive
On Wednesday 26 September 2007, Chris Lattner wrote:> > llvm-gcc calling llvm-ld: > > llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld > > > > enhance collect2: > > llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2 -> ld > > I'd rather enhance collect2 like this: > > llvm-gcc -> llvm-collect2(liblto) -> ld > > Where llvm-collect2 is just collect2 that dlopen's liblto to do the > optimization work. This makes it work much more naturally than adding > a whole new set of steps. Depending on llvm-ld will never get you to > a world where LTO is transparent, because llvm-ld doesn't support a > lot of options and features that native linkers do.So the llvm-collect2 will combine the functionality of the original collect2 and of llvm-ld? When it executes, it would take the following steps: 1. for each input, determine whether it is in bytecode or native format 2. if there are no bytecode inputs, go to step 6 3. link the bytecode inputs and optimize the resulting bytecode, using liblto 4. if bytecode output was requested, we are done 5. generate native object in a temporary file 6. perform the init/exit fixups that the original collect2 does 7. invoke system linker to link the generated native object (if any) and the input native objects (if any) Assuming those steps are correct, step 6 and 7 could be implemented by using the original collect2 and adding the generated native object to the list of files to link. In other words, llvm-collect2 could be a separate process, which is called instead of collect2, does some processing and then runs the original, unmodified collect2: llvm-gcc -> llvm-collect2(liblto) -> collect2 -> ld Bye, Maarten -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070926/0f16aebb/attachment.sig>
Seemingly Similar Threads
- [LLVMdev] Compiling zlib to static bytecode archive
- [LLVMdev] Compiling zlib to static bytecode archive
- [LLVMdev] Compiling zlib to static bytecode archive
- [LLVMdev] Compiling zlib to static bytecode archive
- [LLVMdev] Compiling zlib to static bytecode archive