thr3ads.net - llvm dev - [LLVMdev] Compiling zlib to static bytecode archive [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Maarten ter Huurne

2007-Sep-23 10:27 UTC

[LLVMdev] Compiling zlib to static bytecode archive

On Friday 21 September 2007, Chris Lattner wrote:> On Sep 21, 2007, at 9:42 AM, Maarten ter Huurne wrote:
> > However, it is not possible to let the zlib Makefile issue that
> > command
> > without patching the Makefile, because the fragment that does the
> > linking is
> > hardcoded to use the compiler command for linking:
> >
> >   example$(EXE): example.o $(LIBS)
> >           $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS)
>
> Right, unfortunately the current Link Time Optimization model
> requires the linker to "know" about LLVM.
> http://llvm.org/docs/LinkTimeOptimization.html
That's the reason I want to try and build a bytecode lib: to see if link
time
optimization of executable + libs has any effect on performance and on code 
size. My guess is that performance won't improve much, since there
aren't
that many calls per second which cross the app-lib boundary. But code size 
could improve if unused optional features can be elimated as dead code 
because a function is only called in one particular way.

By the way, the example from that document does not work with the current 
llvm-gcc (GCC 4.0, LLVM 2.1-pre1). The last command fails:

$ llvm-gcc a.o main.o -o main
a.o: file not recognized: File format not recognized
collect2: ld returned 1 exit status

Linking with llvm-ld does work:

$ llvm-ld a.o main.o -native -o main
$ ./main
$ echo $?
42

The link step combines one or more input files into one output file. The input 
files can be all bytecode, all native or mixed. The output file can be 
bytecode or native. Since it is only possible to convert from bytecode to 
native and not vice versa, bytecode output requires all bytecode input. So 
the combinations are:

bytecode input, bytecode output:
Can be handled by llvm-ld without invoking system compiler/linker.

native input, native output:
Handled by system compiler/linker.

bytecode or mixed input, native output:
According to the llvm-ld man page, llvm-ld will generate native code from the 
bytecode files and invoke the system compiler to do the actual linking.
> > Would it be possible to make llvm-gcc call llvm-ld instead of the
> > systemwide
> > ld? I tried setting the environment variables COMPILER_PATH=/usr/
> > local/bin
> > and GCC_EXEC_PREFIX=llvm- but that had no effect.
>
> I see two solutions to this.  One is to have llvm-gcc call llvm-ld
> when it has some option passed to it. Another would be to enhance 
> 'collect2' to know about LLVM files.  'collect2' is a GCC
utility
> invoked at link time, it would be the perfect place to add hooks.
I found the documentation of collect2 here:
  http://gcc.gnu.org/onlinedocs/gccint/Collect2.html

Its purpose seems to be to act like ld and insert calls to initialization 
routines (and exit routines) before calling the real ld. The comment at the 
top of the source file describes it like this:

   Collect static initialization info into data structures that can be
   traversed by C++ initialization and finalization routines.

According to this comment in the collect2 source, having collect2 accept 
options that ld does not accept will cause trouble:

  /* !!! When GCC calls collect2,
     it does not know whether it is calling collect2 or ld.
     So collect2 cannot meaningfully understand any options
     except those ld understands.
     If you propose to make GCC pass some other option,
     just imagine what will happen if ld is really ld!!!  */

Originally I was under the impression that llvm-ld was just an LLVM-aware 
version of ld, but that is not the case. For example, when creating an output 
file in native format, it runs the system compiler on the generated native 
code and that compiler automatically picks up libraries such as libc, which 
must be specified explicitly to ld. Also, although llvm-ld accepts many of 
the options accepted by ld, GCC uses some ld options that llvm-ld does not 
accept.

Going back to the two options you mentioned, they would lead to the following 
invocation chains. Let's use the "mixed input, native output"
scenario: if we
can support that, we can support the rest as well.

llvm-gcc calling llvm-ld:
  llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld

enhance collect2:
  llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2 -> ld

llvm-collect2 is the enhanced collect2, while plain collect2 is the one that 
belongs to the system compiler. Note that this assumes the system compiler is 
GCC, otherwise the "gcc -> collect2 -> ld" chain will be
something else, but
will perform the same function.

Since llvm-ld invokes the system compiler to do the actual linking, the 
executable it produces will already have the proper init/exit sequences. So 
llvm-collect2 would not have anything to do.

To summarize:
- llvm-ld (currently) does not accept all flags that GCC passes to collect2
- an LLVM-aware collect2 would never perform the core function of collect2,
  which is generating init/exit code and data

Therefore, I think the scenario of llvm-gcc calling llvm-ld directly is 
preferable.
> The thing we're missing most right now is a volunteer to tackle this
> project :)
Since this is all new terrain for me, I might get stuck before producing 
anything useful. But I'm willing to try.

Bye,
		Maarten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20070923/4a28e0be/attachment.sig>

Chris Lattner

2007-Sep-26 00:17 UTC

head link

[LLVMdev] Compiling zlib to static bytecode archive

On Sep 23, 2007, at 3:27 AM, Maarten ter Huurne wrote:
> On Friday 21 September 2007, Chris Lattner wrote:
>> On Sep 21, 2007, at 9:42 AM, Maarten ter Huurne wrote:
>>> However, it is not possible to let the zlib Makefile issue that
>>> command
>>> without patching the Makefile, because the fragment that does the
>>> linking is
>>> hardcoded to use the compiler command for linking:
>>>
>>>   example$(EXE): example.o $(LIBS)
>>>           $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS)
>>
>> Right, unfortunately the current Link Time Optimization model
>> requires the linker to "know" about LLVM.
>> http://llvm.org/docs/LinkTimeOptimization.html
>
> That's the reason I want to try and build a bytecode lib: to see if  
> link time
> optimization of executable + libs has any effect on performance and  
> on code
> size.
Right.
> My guess is that performance won't improve much, since there aren't
> that many calls per second which cross the app-lib boundary. But  
> code size
> could improve if unused optional features can be elimated as dead code
> because a function is only called in one particular way.
make sense!
> By the way, the example from that document does not work with the  
> current
> llvm-gcc (GCC 4.0, LLVM 2.1-pre1). The last command fails:
>
> $ llvm-gcc a.o main.o -o main
> a.o: file not recognized: File format not recognized
> collect2: ld returned 1 exit status
Again, this is because your native linker doesn't support liblto.
> Linking with llvm-ld does work:
>
> $ llvm-ld a.o main.o -native -o main
> $ ./main
> $ echo $?
> 42
>
> The link step combines one or more input files into one output  
> file. The input
> files can be all bytecode, all native or mixed. The output file can be
> bytecode or native. Since it is only possible to convert from  
> bytecode to
> native and not vice versa, bytecode output requires all bytecode  
> input. So
> the combinations are:
>
> bytecode input, bytecode output:
> Can be handled by llvm-ld without invoking system compiler/linker.
Yes, but note that this only works if you limit yourself to linker  
options known by llvm-ld.  If you use funky stuff, llvm-ld won't be  
able to handle it.  Also, llvm-ld may or may not handle archive  
resolution correctly (I don't remember).
> native input, native output:
> Handled by system compiler/linker.
>
> bytecode or mixed input, native output:
> According to the llvm-ld man page, llvm-ld will generate native  
> code from the
> bytecode files and invoke the system compiler to do the actual  
> linking.
Yes.
>>> Would it be possible to make llvm-gcc call llvm-ld instead of the
>>> systemwide
>>> ld? I tried setting the environment variables COMPILER_PATH=/usr/
>>> local/bin
>>> and GCC_EXEC_PREFIX=llvm- but that had no effect.
>>
>> I see two solutions to this.  One is to have llvm-gcc call llvm-ld
>> when it has some option passed to it. Another would be to enhance
>> 'collect2' to know about LLVM files.  'collect2' is a
GCC utility
>> invoked at link time, it would be the perfect place to add hooks.
>
> I found the documentation of collect2 here:
>   http://gcc.gnu.org/onlinedocs/gccint/Collect2.html
>
> Its purpose seems to be to act like ld and insert calls to  
> initialization
> routines (and exit routines) before calling the real ld. The  
> comment at the
> top of the source file describes it like this:
>
>    Collect static initialization info into data structures that can be
>    traversed by C++ initialization and finalization routines.
Right, that is its intended purpose.  It seems fairly straight  
forward to abuse it for our devious plans though :)
> Originally I was under the impression that llvm-ld was just an LLVM- 
> aware
> version of ld, but that is not the case. For example, when creating  
> an output
> file in native format, it runs the system compiler on the generated  
> native
> code and that compiler automatically picks up libraries such as  
> libc, which
> must be specified explicitly to ld. Also, although llvm-ld accepts  
> many of
> the options accepted by ld, GCC uses some ld options that llvm-ld  
> does not
> accept.
Right.
> Going back to the two options you mentioned, they would lead to the  
> following
> invocation chains. Let's use the "mixed input, native output"
> scenario: if we
> can support that, we can support the rest as well.
>
> llvm-gcc calling llvm-ld:
>   llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld
>
> enhance collect2:
>   llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2 ->
ld
I'd rather enhance collect2 like this:

llvm-gcc -> llvm-collect2(liblto) -> ld

Where llvm-collect2 is just collect2 that dlopen's liblto to do the  
optimization work. This makes it work much more naturally than adding  
a whole new set of steps.  Depending on llvm-ld will never get you to  
a world where LTO is transparent, because llvm-ld doesn't support a  
lot of options and features that native linkers do.
> To summarize:
> - llvm-ld (currently) does not accept all flags that GCC passes to  
> collect2
> - an LLVM-aware collect2 would never perform the core function of  
> collect2,
>   which is generating init/exit code and data
>
> Therefore, I think the scenario of llvm-gcc calling llvm-ld  
> directly is
> preferable.
Ah, but if the llvm-collect2 version was enhanced to do everything it  
does now, and additionally interface with liblto, then everyone wins :)
>> The thing we're missing most right now is a volunteer to tackle
this
>> project :)
>
> Since this is all new terrain for me, I might get stuck before  
> producing
> anything useful. But I'm willing to try.
Yay!  Many people will appreciate this!

-Chris

Maarten ter Huurne

2007-Sep-26 01:12 UTC

head link

[LLVMdev] Compiling zlib to static bytecode archive

On Wednesday 26 September 2007, Chris Lattner wrote:
> > llvm-gcc calling llvm-ld:
> >   llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld
> >
> > enhance collect2:
> >   llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2
-> ld
>
> I'd rather enhance collect2 like this:
>
> llvm-gcc -> llvm-collect2(liblto) -> ld
>
> Where llvm-collect2 is just collect2 that dlopen's liblto to do the
> optimization work. This makes it work much more naturally than adding
> a whole new set of steps.  Depending on llvm-ld will never get you to
> a world where LTO is transparent, because llvm-ld doesn't support a
> lot of options and features that native linkers do.
So the llvm-collect2 will combine the functionality of the original collect2 
and of llvm-ld?

When it executes, it would take the following steps:
1. for each input, determine whether it is in bytecode or native format
2. if there are no bytecode inputs, go to step 6
3. link the bytecode inputs and optimize the resulting bytecode, using liblto
4. if bytecode output was requested, we are done
5. generate native object in a temporary file
6. perform the init/exit fixups that the original collect2 does
7. invoke system linker to link the generated native object (if any) and the 
input native objects (if any)

Assuming those steps are correct, step 6 and 7 could be implemented by using 
the original collect2 and adding the generated native object to the list of 
files to link. In other words, llvm-collect2 could be a separate process, 
which is called instead of collect2, does some processing and then runs the 
original, unmodified collect2:
  llvm-gcc -> llvm-collect2(liblto) -> collect2 -> ld

Bye,
		Maarten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20070926/0f16aebb/attachment.sig>

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Sep 2007 - [LLVMdev] Compiling zlib to static bytecode archive

[LLVMdev] Compiling zlib to static bytecode archive

[LLVMdev] Compiling zlib to static bytecode archive

[LLVMdev] Compiling zlib to static bytecode archive

Maybe Matching Threads