thr3ads.net - llvm dev - [LLVMdev] Proposal: MCLinker - an LLVM integrated linker [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Tang Luba

2011-Nov-01 18:24 UTC

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

Hi, Brooks,

Since this project is helped by many BSD guys in Taiwan, one of
MCLinker's main objectives is make direct contribution to the BSD
realm. Please feel free to give us suggestions to make sure we can
achieve this goal. Any comments are appreciated.

We realized open discussion on the mailing list is necessary, and we
hope this thread can be a beginning to openly discuss the project
scope, features, the why and the how of MCLinker.

I've read the list, and here are some idea from our group.
>  - LTO framework
>  - Link time optimization against IR or machine code
>  - Support for IR in ELFLLVM has supported LTO on bitcode, and IMHO, it may be good enough.

In GCC, LTO causes 'fat' object files, because GCC needs to serialize
IR into 'intermediate language' (IL) and compress IL in object files.
In our experience, the 'fat' object files are x10 bigger than the
original one, and slow down the linking process significantly. The
generated code can get about only 7%~13% improvement.

IMHO, LLVM provides a better solution than GCC. With LLVM, users can
compile source files and generate many small bitcodes. LTO can be
performed well when link these small bitcodes into a 'big bitcode'.
MCLinker reads the 'big bitcode' and generate EXE/DSOs.
Since the 'big bitcode' is only a little bit bigger than the generated
file, we can avoid generating the 'fat' objects and also get enough
performance improvement.

Apart from the LTO, we also have some good idea on link time
optimization. I will open another thread to discuss this later.
>  - linker scripts (or equivalent)Linker scripts is a thorny problem. The grammar of link script
language in GNU ld is context sensitive, and hard to be implemented.
Maybe we can list the necessary requirements first, and try to define
a simpler grammar.
>  - Incremental linking
>  - GNU ld compatibility
>  - IR processing by plugin
>  - Limited non-ELF support (for boot blocks, etc)
>  - Alternative hash table support
>  - Crunching support
>  - Be fast
>  - Native cross-architecture support
>  - Multipass lookup
>  - Unit tests
>  - Coded to LLVM standards (to allow inclusion in LLVM)
>  - linker is a library
>  - C and C++ support
>  - Architecture support: i386, x86_64, ARM, PPC(64),
>  - MIPS(64), PiNaCl
>  - Possible architecture support: sparc64
We still have some idea about above features. In order to keep the
discussion easy to follow, I will discuss them in other threads.

BTW, sorry for the appearance of "Email Confidentially Notice". I
asked our IT remove it from all our emails immediately. And also sorry
for some scrambled characters in the name. I had asked all my members
should use English name.

Best regards,
Luba

Joerg Sonnenberger

2011-Nov-01 18:38 UTC

head link

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

On Wed, Nov 02, 2011 at 02:24:12AM +0800, Tang Luba
wrote:> >  - linker scripts (or equivalent)
> Linker scripts is a thorny problem. The grammar of link script
> language in GNU ld is context sensitive, and hard to be implemented.
> Maybe we can list the necessary requirements first, and try to define
> a simpler grammar.
It is not necessary to preserve compatiblity with GNU linker scripts.
There are many good reasons not to, but the functionality has to exist.
Some of the issues to be addressed:

(1) Mapping sections to fixed offsets.
(2) Ordering of sections and aggregation into PT_LOAD segments.
(3) Setting non-default attributes on segments, e.g. making debug
information loadable for specific applications.
(4) Adding start-of-section/end-of-section markers.

Joerg

Luba Tang

2011-Nov-01 19:15 UTC

head link

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

> Apart from the LTO, we also have some good idea on link time
> optimization. I will open another thread to discuss this later.
> Sorry, I made a stupid mistake. I mean "some good idea on the optimizations
that can be done by linkers, such as instruction relaxation, and how to
efficiently use IP register"

Diego Novillo

2011-Nov-02 03:11 UTC

head link

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

On Tue, Nov 1, 2011 at 18:24, Tang Luba <lubatang at gmail.com> wrote:
> In GCC, LTO causes 'fat' object files, because GCC needs to
serialize
> IR into 'intermediate language' (IL) and compress IL in object
files.
> In our experience, the 'fat' object files are x10 bigger than the
> original one, and slow down the linking process significantly. The
> generated code can get about only 7%~13% improvement.
Right.  Though GCC 4.7 will offer an option to emit just bytecode in
object files.  Additionally, the biggest gains we generally observe
with LTO is when it's coupled with FDO.  And almost always, the
biggest wins are in the inliner
(http://gcc.gnu.org/wiki/LightweightIpo).
> Apart from the LTO, we also have some good idea on link time
> optimization. I will open another thread to discuss this later.
You may want to look at Diablo (http://diablo.elis.ugent.be/).  An
optimizing linker that has been around for a while.  I'm not sure
whether it is still being developed, but they had several interesting
ideas in it.


Diego.

陳韋任

2011-Nov-02 03:19 UTC

head link

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

> You may want to look at Diablo (http://diablo.elis.ugent.be/).  An
> optimizing linker that has been around for a while.  I'm not sure
> whether it is still being developed, but they had several interesting
> ideas in it.
  Diablo is still being maintained. I checked its status few days ago
on Diablo mailing list.

Regards,
chenwj 

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667

Chinyen Chou

2011-Nov-02 08:57 UTC

head link

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

Thanks for the useful information. We notice that the idea of LIPO also can
help LLVM LTO if LLVM has FDO/PGO. And regarding Diablo, we'll learn from
it and I think we'll get some good ideas from it.

In MCLinker, the detail of the instructions and data in bitcode are still
kept during linking, so some opportunities to optimize the instruction in
bitcode become intuitive. Instruction relaxation is one of the cases.
(Since ARM is one of the target we focus on, I'm going to use ARM to
illustrate the problem.)

When linking bitcode and other object files, stubs are necessary if the
branch range is too far or ARM/THUMB mode switching. Google gold linker
uses two kinds of stubs basically. One is consecutive branch instructions,
and the other is one branch instruction with one following instruction
(e.g., ldr) which changes PC directly.

Example of the later cases,

1: bl    <stub_address>
...
2: ldr   pc, [pc, #-4]   ; stub
3: dcd   R_ARM_ABS32(X)

In MCLinker, we can optimize it as following:

X: ldr   ip, [pc, #-4]
Y: dcd   R_ARM_ABS32(X)
Z: bx    ip

Before optimization, some processors suffer from flushing ROB/Q because
their pipelines are fulfilled with the invalid instructions that
immediately appear after ldr. However, all of these instructions should not
be executed, and processors must flush them when ldr is committed.

Since all details of instruction and data are reserved, MCLinker can
directly rewrite the program without insertion of stub. It can replace the
1:bl instruction with a longer branch Z: bx, and the performance of the
program is therefore improved by efficient use of branch target buffer
(BTB).
This is just one case, and there are other optimizations we can do..

Thanks,
Chinyen
> In GCC, LTO causes 'fat' object files, because GCC needs to
serialize
> > IR into 'intermediate language' (IL) and compress IL in object
files.
> > In our experience, the 'fat' object files are x10 bigger than
the
> > original one, and slow down the linking process significantly. The
> > generated code can get about only 7%~13% improvement.
>
> Right.  Though GCC 4.7 will offer an option to emit just bytecode in
> object files.  Additionally, the biggest gains we generally observe
> with LTO is when it's coupled with FDO.  And almost always, the
> biggest wins are in the inliner
> (http://gcc.gnu.org/wiki/LightweightIpo).
>
> > Apart from the LTO, we also have some good idea on link time
> > optimization. I will open another thread to discuss this later.
>
> You may want to look at Diablo (http://diablo.elis.ugent.be/).  An
> optimizing linker that has been around for a while.  I'm not sure
> whether it is still being developed, but they had several interesting
> ideas in it.
>
>
> Diego.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111102/fb517538/attachment.html>

Don Quixote de la Mancha

2011-Nov-03 06:05 UTC

head link

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

A helpful link-time optimization would be to place subroutines that
are used close together in time also close together in the executable
file.  That also goes for data that is in the executable file, whether
initialized (.data segment) or zero-initialized (.bss).

If the unit of linkage of code is the function rather than the
compilation module, and the unit of linkage of data is the individual
data item rather than all the .data and .bss items together that are
in a compilation unit, you could rearrange them at will.

For architectures such as ARM that cannot make jumps to faraway
addresses, you could make the destinations of subroutine calls close
to the caller so you would not need so many trampolines.

The locality improves the speed because the program would use the code
and data caches more efficiently, and would page in data and code from
disk less often.

Having fewer physically resident pages also saves on precious kernel
memory.  I read in O'Reilly's "Understanding the Linux Kernel"
that on
the i386 architecture, the kernel's page tables consume most of the
physical memory in the computer, leaving very little physical memory
for user processes!

A first cut would be to start with the runtime program startup code,
which for C program then calls main().  The subroutines that main
calls would be placed next in the file.  Suppose main calls Foo() and
then Bar().  One would then place each of the subroutines that Foo()
calls all together, then each of the subroutines that Bar() calls.

It would be best if some static analysis were performed to determine
in what order subroutines are called, and in what order .data and .bss
memory is accessed.

Getting that analysis right for the general case would not be easy, as
the time-order in which subroutines are called may of course depend on
the input data.  To improve the locality, one could produce an
instrumented executable which saved a stack trace at the entry of each
subroutine.  Examination of all the stack traces would enable a
post-processing tool to generate a linker script that would be used
for a second pass of the linker.  This is a form of profiler-guided
optimization.

For extra credit one could prepare multiple input files (or for
interactive programs, several distinctly different GUI robot scripts).
 Then the tool that prepared the linker script would try to optimize
for the average case for most code.

Regards,

Don Quixote
-- 
Don Quixote de la Mancha
Dulcinea Technologies Corporation
Software of Elegance and Beauty
http://www.dulcineatech.com
quixote at dulcineatech.com

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Nov 2011 - [LLVMdev] Proposal: MCLinker - an LLVM integrated linker

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

Possibly Parallel Threads