thr3ads.net - llvm dev - [LLVMdev] LLD improvement plan [May 2015]

If this information is useful, please help other people find it:
Share via:

Rui Ueyama

2015-May-01 19:31 UTC

[LLVMdev] LLD improvement plan

Hi guys, After working for a long period of time on LLD, I think I found a
few things that we should improve in the LLD design for both development
ease and runtime performance. I would like to get feedback on this
proposal. Thanks! *Problems with the current LLD architecture *The current
LLD architecture has, in my opinion, two issues.

*The atom model is not the best model for some architectures *The atom
model makes sense only for Mach-O, but it’s used everywhere. I guess that
we originally expected that we would be able to model the linker’s behavior
beautifully using the atom model because the atom model seemed like a
superset of the section model. Although it *can*, it turned out that it’s
not necessarily natural and efficient model for ELF or PE/COFF on which
section-based linking is expected. On ELF or PE/COFF, sections are units of
atomic data. We divide a section into smaller “atoms” and then restore the
original data layout later to preserve section’s atomicity. That
complicates the linker internals. Also it slows down the linker because of
the overhead of creating and manipulating atoms. In addition to that, since
section-based linking is expected on the architectures, some linker
features are defined in terms of sections. An example is “select largest
section” in PE/COFF. In the atom model, we don’t have a notion of sections
at all, so we had to simulate such features using atoms in tricky ways.

*One symbol resolution model doesn’t fit all *The symbol resolution
semantics are not the same on three architectures (ELF, Mach-O and
PE/COFF), but we only have only one "core" linker for the symbol
resolution. The core linker implements the Unix linker semantics; the
linker visits a file at a time until all undefined symbols are resolved.
For archive files having circular dependencies, you can group them to tell
the linker to visit them more than once. This is not the only model to
create a linker. It’s not the simplest nor fastest. It’s just that the Unix
linker semantics is designed this way, and we all follow for compatibility.
For PE/COFF, the linker semantics are different. The order of files in the
command line doesn’t matter. The linker scans all files first to create a
map from symbols to files, and use the map to resolve all undefined
symbols. The PE/COFF semantics are currently simulated using the Unix
linker semantics and groups. That made the linker inefficient because of
the overhead to visit archive files again and again. Also it made the code
bloated and awkward. In short, we generalize too much, and we share code
too much.

*Proposal*

   1. Re-architect the linker based on the section model where it’s
   appropriate.
   2. Stop simulating different linker semantics using the Unix model.
   Instead, directly implement the native behavior.

When it’s done, the atom model will be used only for Mach-O. The other two
will be built based on the section model. PE/COFF will have a different
"core" linker than Unix’s. I expect this will simplify the design and
also
improve the linker’s performance (achieving better performance is probably
the best way to convince people to try LLD). I don’t think we can gradually
move from the atom model to the section model because atoms are everywhere.
They are so different that we cannot mix them together at one place.
Although we can reuse the design and the outline the existing code, this is
going to be more like a major rewriting rather than updating. So I propose
developing section-based ports as new "ports" of LLD. I plan to start
working on PE/COFF port first because I’m familiar with the code base and
the amount of code is less than the ELF port. Also, the fact that the ELF
port is developed and maintained by many developers makes porting harder
compared to PE/COFF, which is written and maintained only by me. Thus, I’m
going to use PE/COFF as an experiment platform to see how it works. Here is
a plan.

   1. Create a section-based PE/COFF linker backend as a new port
   2. If everything is fine, do the same thing for ELF. We may want to move
   common code for a section-based linker out of the new PE/COFF port to share
   it with ELF.
   3. Move the library for the atom model to the sub-directory for the
   Mach-O port.

The resulting linker will share less code between ports. That’s not
necessarily a bad thing -- we actually think it’s a good thing because in
order to share code we currently have too many workarounds. This change
should fix the balance so that we get (1) shared code that’s naturally able
to be shared by multiple ports, and (2) simpler, faster code.
*Work Estimation *It’s hard to tell, but I’m probably able to create a
PE/COFF linker in a few weeks, which works reasonably well and ready for
code review as a first set of patches. I have already built a complete
linker for Windows, so the hardest part (understanding it) is already done.
Once it’s done, I can get a better estimation for ELF.
*Caveat **Why not define a section as an atom and keep using the atom
model? *If we do this, we would have to allow atoms to have more than one
name. Each name would have an offset in the atom (to represent symbols
whose offset from the section start is not zero). But still we need to copy
section attributes to each atom. The resulting model no longer looks like
the atom model, but a mix of the atom model and the section model, and that
comes with the cost of both designs. I think it’s too complicated.

*Notes*
We want to make sure there’s no existing LLD users who depend on the atom
model for ELF, or if there’s such users, we want to come up with a
transition path for them.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/4db15a0c/attachment.html>

Michael Spencer

2015-May-01 20:32 UTC

head link

[LLVMdev] LLD improvement plan

On Fri, May 1, 2015 at 12:31 PM, Rui Ueyama <ruiu at google.com>
wrote:> Caveat Why not define a section as an atom and keep using the atom model?
If
> we do this, we would have to allow atoms to have more than one name. Each
> name would have an offset in the atom (to represent symbols whose offset
> from the section start is not zero). But still we need to copy section
> attributes to each atom. The resulting model no longer looks like the atom
> model, but a mix of the atom model and the section model, and that comes
> with the cost of both designs. I think it’s too complicated.
Rafael and I have been discussing this change recently. It makes atoms
actually atomic, and also splits out symbols, which has been needed.
The main reason I like this over each target having its own model is
because it gives us a common textual representation to write tests
with.

As for symbol resolution. It seems the actual problem is name lookup,
not the core resolver semantics.

I'd rather not end up with basically 3 separate linkers in lld.

- Michael Spencer

Rui Ueyama

2015-May-01 20:42 UTC

head link

[LLVMdev] LLD improvement plan

On Fri, May 1, 2015 at 1:32 PM, Michael Spencer <bigcheesegs at gmail.com>
wrote:
> On Fri, May 1, 2015 at 12:31 PM, Rui Ueyama <ruiu at google.com>
wrote:
> > Caveat Why not define a section as an atom and keep using the atom
> model? If
> > we do this, we would have to allow atoms to have more than one name.
Each
> > name would have an offset in the atom (to represent symbols whose
offset
> > from the section start is not zero). But still we need to copy section
> > attributes to each atom. The resulting model no longer looks like the
> atom
> > model, but a mix of the atom model and the section model, and that
comes
> > with the cost of both designs. I think it’s too complicated.
>
> Rafael and I have been discussing this change recently. It makes atoms
> actually atomic, and also splits out symbols, which has been needed.
> The main reason I like this over each target having its own model is
> because it gives us a common textual representation to write tests
> with.
>
If you allow multiple symbols in one atom, is the new definition of atom
different from section? If so, in what way?

As for symbol resolution. It seems the actual problem is name
lookup,> not the core resolver semantics.
>
What's the difference between name lookup and the core resolver semantics?

> I'd rather not end up with basically 3 separate linkers in lld.
>
I basically agree. However, if you take a look at the code of the PE/COFF
port, you'll find something weird here and there.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/f7d3aa3e/attachment.html>

Rafael Espíndola

2015-May-02 01:19 UTC

head link

[LLVMdev] LLD improvement plan

I am on the airport waiting to go on vacations, but I must say I am
extremely happy to see this happen!

I agree with the proposed direction and steps:

Implement section based linking for coff.

Use that for elf.

If it makes sense, use it for macho.
 On May 1, 2015 3:32 PM, "Rui Ueyama" <ruiu at google.com>
wrote:
> Hi guys, After working for a long period of time on LLD, I think I found a
> few things that we should improve in the LLD design for both development
> ease and runtime performance. I would like to get feedback on this
> proposal. Thanks! *Problems with the current LLD architecture *The
> current LLD architecture has, in my opinion, two issues.
>
> *The atom model is not the best model for some architectures *The atom
> model makes sense only for Mach-O, but it’s used everywhere. I guess that
> we originally expected that we would be able to model the linker’s behavior
> beautifully using the atom model because the atom model seemed like a
> superset of the section model. Although it *can*, it turned out that it’s
> not necessarily natural and efficient model for ELF or PE/COFF on which
> section-based linking is expected. On ELF or PE/COFF, sections are units of
> atomic data. We divide a section into smaller “atoms” and then restore the
> original data layout later to preserve section’s atomicity. That
> complicates the linker internals. Also it slows down the linker because of
> the overhead of creating and manipulating atoms. In addition to that, since
> section-based linking is expected on the architectures, some linker
> features are defined in terms of sections. An example is “select largest
> section” in PE/COFF. In the atom model, we don’t have a notion of sections
> at all, so we had to simulate such features using atoms in tricky ways.
>
> *One symbol resolution model doesn’t fit all *The symbol resolution
> semantics are not the same on three architectures (ELF, Mach-O and
> PE/COFF), but we only have only one "core" linker for the symbol
> resolution. The core linker implements the Unix linker semantics; the
> linker visits a file at a time until all undefined symbols are resolved.
> For archive files having circular dependencies, you can group them to tell
> the linker to visit them more than once. This is not the only model to
> create a linker. It’s not the simplest nor fastest. It’s just that the Unix
> linker semantics is designed this way, and we all follow for compatibility.
> For PE/COFF, the linker semantics are different. The order of files in the
> command line doesn’t matter. The linker scans all files first to create a
> map from symbols to files, and use the map to resolve all undefined
> symbols. The PE/COFF semantics are currently simulated using the Unix
> linker semantics and groups. That made the linker inefficient because of
> the overhead to visit archive files again and again. Also it made the code
> bloated and awkward. In short, we generalize too much, and we share code
> too much.
>
> *Proposal*
>
>    1. Re-architect the linker based on the section model where it’s
>    appropriate.
>    2. Stop simulating different linker semantics using the Unix model.
>    Instead, directly implement the native behavior.
>
> When it’s done, the atom model will be used only for Mach-O. The other two
> will be built based on the section model. PE/COFF will have a different
> "core" linker than Unix’s. I expect this will simplify the design
and also
> improve the linker’s performance (achieving better performance is probably
> the best way to convince people to try LLD). I don’t think we can gradually
> move from the atom model to the section model because atoms are everywhere.
> They are so different that we cannot mix them together at one place.
> Although we can reuse the design and the outline the existing code, this is
> going to be more like a major rewriting rather than updating. So I propose
> developing section-based ports as new "ports" of LLD. I plan to
start
> working on PE/COFF port first because I’m familiar with the code base and
> the amount of code is less than the ELF port. Also, the fact that the ELF
> port is developed and maintained by many developers makes porting harder
> compared to PE/COFF, which is written and maintained only by me. Thus, I’m
> going to use PE/COFF as an experiment platform to see how it works. Here is
> a plan.
>
>    1. Create a section-based PE/COFF linker backend as a new port
>    2. If everything is fine, do the same thing for ELF. We may want to
>    move common code for a section-based linker out of the new PE/COFF port
to
>    share it with ELF.
>    3. Move the library for the atom model to the sub-directory for the
>    Mach-O port.
>
> The resulting linker will share less code between ports. That’s not
> necessarily a bad thing -- we actually think it’s a good thing because in
> order to share code we currently have too many workarounds. This change
> should fix the balance so that we get (1) shared code that’s naturally able
> to be shared by multiple ports, and (2) simpler, faster code.
> *Work Estimation *It’s hard to tell, but I’m probably able to create a
> PE/COFF linker in a few weeks, which works reasonably well and ready for
> code review as a first set of patches. I have already built a complete
> linker for Windows, so the hardest part (understanding it) is already done.
> Once it’s done, I can get a better estimation for ELF.
> *Caveat **Why not define a section as an atom and keep using the atom
> model? *If we do this, we would have to allow atoms to have more than one
> name. Each name would have an offset in the atom (to represent symbols
> whose offset from the section start is not zero). But still we need to copy
> section attributes to each atom. The resulting model no longer looks like
> the atom model, but a mix of the atom model and the section model, and that
> comes with the cost of both designs. I think it’s too complicated.
>
> *Notes*
> We want to make sure there’s no existing LLD users who depend on the atom
> model for ELF, or if there’s such users, we want to come up with a
> transition path for them.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/5bfb858d/attachment.html>

Nick Kledzik

2015-May-02 01:42 UTC

head link

[LLVMdev] LLD improvement plan

On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com>
wrote:> The atom model is not the best model for some architectures
The atom model is a good fit for the llvm compiler model for all architectures. 
There is a one-to-one mapping between llvm::GlobalObject (e.g. function or
global variable) and lld:DefinedAtom.

The problem is the ELF/PECOFF file format.   (Actually mach-o is also section
based, but we have refrained from adding complex section-centric features to it,
so mapping it to atoms is not too hard).

I’d rather see our effort put to moving ahead to an llvm based object file
format (aka “native” format) which bypasses the impedance mismatch of going
through ELF/COFF.


> One symbol resolution model doesn’t fit all
Yes, the Resolver was meant to call out to the LinkingContext object to direct
it on how to link.  Somehow that got morphed into “there should be a universal
data model that when the Resolver process the input data, the right platform
specific linking behavior falls out”.


-Nick

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/bb72edf5/attachment.html>

Rafael Espíndola

2015-May-02 02:04 UTC

head link

[LLVMdev] LLD improvement plan

On May 1, 2015 9:42 PM, "Nick Kledzik" <kledzik at apple.com>
wrote:>
>
> On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote:
>>
>> The atom model is not the best model for some architectures
>
>
> The atom model is a good fit for the llvm compiler model for allarchitectures.  There is a one-to-one mapping between llvm::GlobalObject
(e.g. function or global variable) and lld:DefinedAtom.

That is not the input to the linker and therefore irrelevant.
> The problem is the ELF/PECOFF file format.   (Actually mach-o is alsosection based, but we have refrained from adding complex section-centric
features to it, so mapping it to atoms is not too hard).

The objective is to build an elf and coff linker. The input has sections
and splitting them is a total waste of time and extra design complexity.
> I’d rather see our effort put to moving ahead to an llvm based objectfile format (aka “native” format) which bypasses the impedance mismatch of
going through ELF/COFF.

Absolutely not. We have to be able to handle elf and coff and do it well.

Also, gold shows that elf at least works extremely well. With function
sections the compiler is in complete control of the size of the units the
linker uses. With my recent work on MC the representation is also very
efficient. I have no reason to believe coff is any different.

Cheers,
Rafael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/bb72314b/attachment.html>

Chandler Carruth

2015-May-02 02:06 UTC

head link

[LLVMdev] LLD improvement plan

On Fri, May 1, 2015 at 6:46 PM Nick Kledzik <kledzik at apple.com> wrote:
>
> On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote:
>
> *The atom model is not the best model for some architectures *
>
>
> The atom model is a good fit for the llvm compiler model for all
> architectures.  There is a one-to-one mapping between llvm::GlobalObject
> (e.g. function or global variable) and lld:DefinedAtom.
>
I'm not sure how that's really relevant.

On some architectures, the unit at which linking is defined to occur isn't
a global object. A classic example of this are architectures that have a
hard semantic reliance grouping two symbols together and linking either
both or neither of them.

> The problem is the ELF/PECOFF file format.   (Actually mach-o is also
> section based, but we have refrained from adding complex section-centric
> features to it, so mapping it to atoms is not too hard).
>
> I’d rather see our effort put to moving ahead to an llvm based object file
> format (aka “native” format) which bypasses the impedance mismatch of going
> through ELF/COFF.
>
We still have to be able to (efficiently) link existing ELF and COFF
objects though? While I'm actually pretty interested in some better object
file format, I also want a better linker for the world we live in today...

>
>
>
>
> *One symbol resolution model doesn’t fit all*
>
>
> Yes, the Resolver was meant to call out to the LinkingContext object to
> direct it on how to link.  Somehow that got morphed into “there should be a
> universal data model that when the Resolver process the input data, the
> right platform specific linking behavior falls out”.
>
>
> -Nick
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150502/49e18575/attachment.html>

James Courtier-Dutton

2015-May-04 08:29 UTC

head link

[LLVMdev] LLD improvement plan

On 1 May 2015 at 20:31, Rui Ueyama <ruiu at google.com>
wrote:>
> One symbol resolution model doesn’t fit all The symbol resolution semantics
> are not the same on three architectures (ELF, Mach-O and PE/COFF), but we
> only have only one "core" linker for the symbol resolution. The
core linker
> implements the Unix linker semantics; the linker visits a file at a time
> until all undefined symbols are resolved. For archive files having circular
> dependencies, you can group them to tell the linker to visit them more than
> once. This is not the only model to create a linker. It’s not the simplest
> nor fastest. It’s just that the Unix linker semantics is designed this way,
> and we all follow for compatibility. For PE/COFF, the linker semantics are
> different. The order of files in the command line doesn’t matter. The
linker
> scans all files first to create a map from symbols to files, and use the
map
> to resolve all undefined symbols. The PE/COFF semantics are currently
> simulated using the Unix linker semantics and groups. That made the linker
> inefficient because of the overhead to visit archive files again and again.
> Also it made the code bloated and awkward. In short, we generalize too
much,
> and we share code too much.
>
Why can't LLD be free to implement a resolving algorithm that performs
better.
The PE/COFF method you describe seems more efficient that the existing
ELF method.
What is stopping LLD from using the PE/COFF method for ELF. It could
also do further optimizations such as caching the resolved symbols.
To me,the existing algorithms read as ELF == Full table scan,  PE/COEF
== Indexed.

Also, could some of the symbol resolution be done at compile time?
E.g. If I include stdio.h, I know which link time library that is
associated with, so I can resolve those symbols at compile time.
Maybe we could store that information in the pre-compiled headers file
format, and subsequently in the .o files.
This would then leave far fewer symbols to resolve at link time.

Kind Regards

James

Joerg Sonnenberger

2015-May-04 12:10 UTC

head link

[LLVMdev] LLD improvement plan

On Mon, May 04, 2015 at 09:29:16AM +0100, James Courtier-Dutton
wrote:> Also, could some of the symbol resolution be done at compile time?
> E.g. If I include stdio.h, I know which link time library that is
> associated with, so I can resolve those symbols at compile time.
Where would you get that information from? No such tagging exists in
standard C or even the extended dialect of C clang is implementing.

Joerg

Reid Kleckner

2015-May-04 17:49 UTC

head link

[LLVMdev] LLD improvement plan

Most of what I wanted to say has been said, but I wanted to explicitly call
out COMDAT groups as something that we want that doesn't fit the atom model
very well.

Adding first class COMDATs was necessary for implementing large parts of
the Microsoft C++ ABI, but it also turns out that it's really handy on
other platforms. We've made a number of changes to Clang's IRgen to do
things like eliminate duplicate dynamic initialization for static data
members of class templates and share code for complete, base, and deleting
destructors.

Basically, COMDAT groups are a tool that the compiler can use to change the
way things are linked without changing the linker. They allow the compiler
to add new functionality and reduce coupling between the compiler and the
linker. This is a real tradeoff worth thinking about.

I think for many platforms (Windows, Linux) Clang is not the system
compiler and we need to support efficiently linking against existing
libraries for a long time to come. There are other platforms (Mac, PS4)
with a single toolchain where controlling the linker allows adding new
functionality quickly.

I think Alex is right, we should probably meet some time and figure out
what people need and how to support both kinds of platform well.

Reid

On Fri, May 1, 2015 at 6:42 PM, Nick Kledzik <kledzik at apple.com> wrote:
>
> On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote:
>
> *The atom model is not the best model for some architectures *
>
>
> The atom model is a good fit for the llvm compiler model for all
> architectures.  There is a one-to-one mapping between llvm::GlobalObject
> (e.g. function or global variable) and lld:DefinedAtom.
>
> The problem is the ELF/PECOFF file format.   (Actually mach-o is also
> section based, but we have refrained from adding complex section-centric
> features to it, so mapping it to atoms is not too hard).
>
> I’d rather see our effort put to moving ahead to an llvm based object file
> format (aka “native” format) which bypasses the impedance mismatch of going
> through ELF/COFF.
>
>
>
> *One symbol resolution model doesn’t fit all*
>
>
> Yes, the Resolver was meant to call out to the LinkingContext object to
> direct it on how to link.  Somehow that got morphed into “there should be a
> universal data model that when the Resolver process the input data, the
> right platform specific linking behavior falls out”.
>
>
> -Nick
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/78d8a587/attachment.html>

Rui Ueyama

2015-May-04 18:18 UTC

head link

[LLVMdev] LLD improvement plan

On Mon, May 4, 2015 at 1:29 AM, James Courtier-Dutton <
james.dutton at gmail.com> wrote:
> On 1 May 2015 at 20:31, Rui Ueyama <ruiu at google.com> wrote:
> >
> > One symbol resolution model doesn’t fit all The symbol resolution
> semantics
> > are not the same on three architectures (ELF, Mach-O and PE/COFF), but
we
> > only have only one "core" linker for the symbol resolution.
The core
> linker
> > implements the Unix linker semantics; the linker visits a file at a
time
> > until all undefined symbols are resolved. For archive files having
> circular
> > dependencies, you can group them to tell the linker to visit them more
> than
> > once. This is not the only model to create a linker. It’s not the
> simplest
> > nor fastest. It’s just that the Unix linker semantics is designed this
> way,
> > and we all follow for compatibility. For PE/COFF, the linker semantics
> are
> > different. The order of files in the command line doesn’t matter. The
> linker
> > scans all files first to create a map from symbols to files, and use
the
> map
> > to resolve all undefined symbols. The PE/COFF semantics are currently
> > simulated using the Unix linker semantics and groups. That made the
> linker
> > inefficient because of the overhead to visit archive files again and
> again.
> > Also it made the code bloated and awkward. In short, we generalize too
> much,
> > and we share code too much.
> >
>
> Why can't LLD be free to implement a resolving algorithm that performs
> better.
> The PE/COFF method you describe seems more efficient that the existing
> ELF method.
> What is stopping LLD from using the PE/COFF method for ELF. It could
> also do further optimizations such as caching the resolved symbols.
> To me,the existing algorithms read as ELF == Full table scan,  PE/COEF
> == Indexed.
>
The two semantics are not compatible. The results of the two are not always
the same.

For example, this is why we have to pass -lc after object files instead of
the beginning of the command line. "ln -lc foo.o" would just skip libc
because when it visits the library, there's no undefined symbols to be
resolved. foo.o would then add undefined symbols that could have been
resolved using libc, but it's too late. The link would fail. This is how
linkers on Unix works. There are other differences resulting from the
difference, so we cannot change that unless we break the compatibility.

Also, could some of the symbol resolution be done at compile
time?> E.g. If I include stdio.h, I know which link time library that is
> associated with, so I can resolve those symbols at compile time.
> Maybe we could store that information in the pre-compiled headers file
> format, and subsequently in the .o files.
> This would then leave far fewer symbols to resolve at link time.
>
You can link against an alternative libc, for example, so that's not
usually doable.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/319aafae/attachment.html>

Chris Lattner

2015-May-04 19:52 UTC

head link

[LLVMdev] LLD improvement plan

On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com>
wrote:> Proposal
> Re-architect the linker based on the section model where it’s appropriate.
> Stop simulating different linker semantics using the Unix model. Instead,
directly implement the native behavior.
Preface: I have never personally contributed code to LLD, so don’t take anything
I’m about to say too seriously.  This is not a mandate or anything, just an
observation/idea.

I think that there is an alternative solution to these exact same problems. 
What you’ve identified here is that there are two camps of people working on
LLD, and they have conflicting goals:

- Camp A: LLD is infrastructure for the next generation of awesome linking and
toolchain features, it should take advantage of how compilers work to offer new
features, performance, etc without deep concern for compatibility.

- Camp B: LLD is a drop in replacement system linker (notably for COFF and ELF
systems), which is best of breed and with no compromises w.r.t. that goal.

I think the problem here is that these lead to natural and inescapable tensions,
and Alex summarized how Camp B has been steering LLD away from what Camp A
people want.  This isn’t bad in and of itself, because what Camp B wants is
clearly and unarguably good for LLVM.  However, it is also not sufficient, and
while innovation in the linker space (e.g. a new “native” object file format
generated directly from compiler structures) may or may not actually “work” or
be “worth it”, we won’t know unless we try, and that won’t fulfill its promise
if there are compromises to Camp B.

So here’s my counterproposal: two different linkers.

Lets stop thinking about lld as one linker, and instead think of it is two
different ones.  We’ll build a Camp B linker which is the best of breed section
based linker.  It will support linker scripts and do everything better than any
existing section based linker.  The first step of this is to do what Rui
proposes and rip atoms out of the model.

We will also build a no-holds-barred awesome atom based linker that takes
advantage of everything it can from LLVM’s architecture to enable innovative new
tools without worrying too much about backwards compatibility.

These two linkers should share whatever code makes sense, but also shouldn’t try
to share code that doesn’t make sense.  The split between the semantic model of
sections vs atoms seems like a very natural one to me.

One question is: does it make sense for these to live in the same lld
subproject, or be split into two different subprojects?  I think the answer to
that question is driven from whether there is shared code common between the two
linkers that doesn’t make sense to sink down to the llvm subproject itself.

What do you think?

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150504/0616b551/attachment.html>

Joerg Sonnenberger

2015-May-04 20:16 UTC

head link

[LLVMdev] LLD improvement plan

On Mon, May 04, 2015 at 12:52:55PM -0700, Chris Lattner
wrote:> I think the problem here is that these lead to natural and inescapable
> tensions, and Alex summarized how Camp B has been steering LLD away
> from what Camp A people want.  This isn’t bad in and of itself, because
> what Camp B wants is clearly and unarguably good for LLVM.  However,
> it is also not sufficient, and while innovation in the linker space
> (e.g. a new “native” object file format generated directly from
> compiler structures) may or may not actually “work” or be “worth it”,
> we won’t know unless we try, and that won’t fulfill its promise if
> there are compromises to Camp B.
It has been said in this thread before, but I fail to see how the atom
model is an actual improvement over the fine grained section model. It
seems to be artifically restricted for no good reasons.
> Lets stop thinking about lld as one linker, and instead think of it is
> two different ones.  We’ll build a Camp B linker which is the best of
> breed section based linker.  It will support linker scripts and do
> everything better than any existing section based linker.  The first
> step of this is to do what Rui proposes and rip atoms out of the model.
This is another item that has been irritating me. While it is a very
laudable goal to not depend on linker scripts for the common case, not
having the functionality of fine grained output control is certainly a
problem. They are crucial for embedded developers and also at least
significant for anything near a system kernel.
> We will also build a no-holds-barred awesome atom based linker that
> takes advantage of everything it can from LLVM’s architecture to enable
> innovative new tools without worrying too much about backwards
> compatibility.
I'd say that a good justificatiton for way an atom based linker is/can
be better would be a good start...

Joerg

Rui Ueyama

2015-May-06 21:18 UTC

head link

[LLVMdev] LLD improvement plan

I'm sorry if my suggestion gave an impression that I disregard the Mach-O
port of the LLD linker. I do care about Mach-O. I do not plan to break or
remove any functionality from the current Mach-O port of the LLD. I don't
propose to remove the atom model from the linker as long as it seems to be
a good fit for the port (and looks like it is).

As to the proposal to have two different linkers, I'd think that that's
not
really a counter-proposal, as it's similar to what I'm proposing.

Maybe the view of "future file formats vs the existing formats" (or
"experimental platform vs. practical tool") is not right to get the
difference between the atom model and the section model, since the Mach-O
file an existing file format which we'd want to keep to be on the atom
model. I think we want both even for the existing formats.

My proposal can be read as suggesting we split the LLD linker into two
major parts, the atom model-based and the section model-based, while
keeping the two under the same project and repository. I still think that
we can share code between the two, especially for the LTO, which is I
prefer to have the two under the same repository.

On Mon, May 4, 2015 at 12:52 PM, Chris Lattner <clattner at apple.com>
wrote:
> On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote:
>
> *Proposal*
>
>    1. Re-architect the linker based on the section model where it’s
>    appropriate.
>    2. Stop simulating different linker semantics using the Unix model.
>    Instead, directly implement the native behavior.
>
> Preface: I have never personally contributed code to LLD, so don’t take
> anything I’m about to say too seriously.  This is not a mandate or
> anything, just an observation/idea.
>
>
> I think that there is an alternative solution to these exact same
> problems.  What you’ve identified here is that there are two camps of
> people working on LLD, and they have conflicting goals:
>
> - Camp A: LLD is infrastructure for the next generation of awesome linking
> and toolchain features, it should take advantage of how compilers work to
> offer new features, performance, etc without deep concern for
compatibility.
>
> - Camp B: LLD is a drop in replacement system linker (notably for COFF and
> ELF systems), which is best of breed and with no compromises w.r.t. that
> goal.
>
>
> I think the problem here is that these lead to natural and inescapable
> tensions, and Alex summarized how Camp B has been steering LLD away from
> what Camp A people want.  This isn’t bad in and of itself, because what
> Camp B wants is clearly and unarguably good for LLVM.  However, it is also
> not sufficient, and while innovation in the linker space (e.g. a new
> “native” object file format generated directly from compiler structures)
> may or may not actually “work” or be “worth it”, we won’t know unless we
> try, and that won’t fulfill its promise if there are compromises to Camp B.
>
> So here’s my counterproposal: *two different linkers.*
>
> Lets stop thinking about lld as one linker, and instead think of it is two
> different ones.  We’ll build a Camp B linker which is the best of breed
> section based linker.  It will support linker scripts and do everything
> better than any existing section based linker.  The first step of this is
> to do what Rui proposes and rip atoms out of the model.
>
> We will *also* build a no-holds-barred awesome atom based linker that
> takes advantage of everything it can from LLVM’s architecture to enable
> innovative new tools without worrying too much about backwards
> compatibility.
>
> These two linkers should share whatever code makes sense, but also
> shouldn’t try to share code that doesn’t make sense.  The split between the
> semantic model of sections vs atoms seems like a very natural one to me.
>
> One question is: does it make sense for these to live in the same lld
> subproject, or be split into two different subprojects?  I think the answer
> to that question is driven from whether there is shared code common between
> the two linkers that doesn’t make sense to sink down to the llvm subproject
> itself.
>
> What do you think?
>
> -Chris
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150506/3afd2e0b/attachment.html>

Shankar Easwaran

2015-May-07 02:28 UTC

head link

[LLVMdev] LLD improvement plan

Hi,

There are a lot of advantages to keep on improving the atom model and 
working on that model.

The atom model allowed lld to have a single intermediate representation 
for all the formats ELF/COFF/Mach-O. The native model allowed the 
intermediate representation to be serialized to disk too. If the 
intermediate representations data structures are made available to 
scripting languages most of all linker script script layout can be 
implemented by the end user. A new language also can be developed as 
most of the users need it and it can work on this intermediate 
representation.

The atom model also simplified a lot of usecases like garbage collection 
and having the resolve to deal just with atoms. The section model would 
sound simple from the outside but it it has its own challenges like 
separating the symbol information from section information.

The atom model also simplifies testing as there is one unique/nice way 
to test the core linker independent of the format.

In addition to testing, there are tools that are designed to convert ELF 
to COFF or viceversa, which makes lld to support these usecases by design.

Most of all embedded users want to reduce the final image size by 
compiling code using -ffunction-sections and -fdata-sections, which 
makes the atom model directly model it. Thanks to Espindola for adding 
support for -fno-unique-section-names which makes lld and the atom model 
more useful.

lld has already proven that it can link most of our llvm tools and self 
host with reasonable performance, I dont see why we dont want to 
continue with the Atom model.

Atom model also eases up dealing with LTO in general.

In summary, I would like to continue the ELF ports using the Atom model.

_If a section model is being chosen to model flavors lets not mixing it 
up with Atom model as I can see there would be very less code sharing._

Shankar Easwaran

On 5/4/2015 2:52 PM, Chris Lattner wrote:> On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote:
>> Proposal
>> Re-architect the linker based on the section model where it’s
appropriate.
>> Stop simulating different linker semantics using the Unix model.
Instead, directly implement the native behavior.
> Preface: I have never personally contributed code to LLD, so don’t take
anything I’m about to say too seriously.  This is not a mandate or anything,
just an observation/idea.
>
>
> I think that there is an alternative solution to these exact same problems.
What you’ve identified here is that there are two camps of people working on
LLD, and they have conflicting goals:
>
> - Camp A: LLD is infrastructure for the next generation of awesome linking
and toolchain features, it should take advantage of how compilers work to offer
new features, performance, etc without deep concern for compatibility.
>
> - Camp B: LLD is a drop in replacement system linker (notably for COFF and
ELF systems), which is best of breed and with no compromises w.r.t. that goal.
>
>
> I think the problem here is that these lead to natural and inescapable
tensions, and Alex summarized how Camp B has been steering LLD away from what
Camp A people want.  This isn’t bad in and of itself, because what Camp B wants
is clearly and unarguably good for LLVM.  However, it is also not sufficient,
and while innovation in the linker space (e.g. a new “native” object file format
generated directly from compiler structures) may or may not actually “work” or
be “worth it”, we won’t know unless we try, and that won’t fulfill its promise
if there are compromises to Camp B.
>
> So here’s my counterproposal: two different linkers.
>
> Lets stop thinking about lld as one linker, and instead think of it is two
different ones.  We’ll build a Camp B linker which is the best of breed section
based linker.  It will support linker scripts and do everything better than any
existing section based linker.  The first step of this is to do what Rui
proposes and rip atoms out of the model.
>
> We will also build a no-holds-barred awesome atom based linker that takes
advantage of everything it can from LLVM’s architecture to enable innovative new
tools without worrying too much about backwards compatibility.
>
> These two linkers should share whatever code makes sense, but also
shouldn’t try to share code that doesn’t make sense.  The split between the
semantic model of sections vs atoms seems like a very natural one to me.
>
> One question is: does it make sense for these to live in the same lld
subproject, or be split into two different subprojects?  I think the answer to
that question is driven from whether there is shared code common between the two
linkers that doesn’t make sense to sink down to the llvm subproject itself.
>
> What do you think?
>
> -Chris
>
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the
Linux Foundation

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150506/519a74a2/attachment.html>

Sean Silva

2015-May-29 02:17 UTC

head link

[LLVMdev] LLD improvement plan

On Mon, May 4, 2015 at 12:52 PM, Chris Lattner <clattner at apple.com>
wrote:
> On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote:
>
> *Proposal*
>
>    1. Re-architect the linker based on the section model where it’s
>    appropriate.
>    2. Stop simulating different linker semantics using the Unix model.
>    Instead, directly implement the native behavior.
>
> Preface: I have never personally contributed code to LLD, so don’t take
> anything I’m about to say too seriously.  This is not a mandate or
> anything, just an observation/idea.
>
>
> I think that there is an alternative solution to these exact same
> problems.  What you’ve identified here is that there are two camps of
> people working on LLD, and they have conflicting goals:
>
> - Camp A: LLD is infrastructure for the next generation of awesome linking
> and toolchain features, it should take advantage of how compilers work to
> offer new features, performance, etc without deep concern for
compatibility.
>
> - Camp B: LLD is a drop in replacement system linker (notably for COFF and
> ELF systems), which is best of breed and with no compromises w.r.t. that
> goal.
>
>
> I think the problem here is that these lead to natural and inescapable
> tensions, and Alex summarized how Camp B has been steering LLD away from
> what Camp A people want.
>
I don't think this is correct.

The only reason we should be having a major split along Camp A and Camp B
like you describe is in the face of a concrete "compatibility" feature
that
*absolutely cannot* be implemented without sacrificing a modular,
library-based, well-architected design. That is not what this thread is
about, so I don't think your split is accurate.

I think it has merely been *forgotten* that it would be useful to have an
infrastructure rather than "main() in a library", or even "main()
in a
separate binary"!. This is the LLVM ethos (as I'm sure you know far
better
than I ;).

Both LLVM and Clang prove that hard problems with significant
"compatibility" concerns can be tackled and top-tier in-production QoI
achieved while still upholding a high standard of external reusability that
allows novel uses. Given what LLVM and Clang have been able to absorb (x86
ISA? C++? MSVC inline asm?) without fundamentally destabilizing their
modular design, I think that we can do a bit better with LLD before
throwing in the towel and cleaving it into a "compatibility" and
"shiny"
part.

-- Sean Silva

>  This isn’t bad in and of itself, because what Camp B wants is clearly and
> unarguably good for LLVM.  However, it is also not sufficient, and while
> innovation in the linker space (e.g. a new “native” object file format
> generated directly from compiler structures) may or may not actually “work”
> or be “worth it”, we won’t know unless we try, and that won’t fulfill its
> promise if there are compromises to Camp B.
>
> So here’s my counterproposal: *two different linkers.*
>
> Lets stop thinking about lld as one linker, and instead think of it is two
> different ones.  We’ll build a Camp B linker which is the best of breed
> section based linker.  It will support linker scripts and do everything
> better than any existing section based linker.  The first step of this is
> to do what Rui proposes and rip atoms out of the model.
>
> We will *also* build a no-holds-barred awesome atom based linker that
> takes advantage of everything it can from LLVM’s architecture to enable
> innovative new tools without worrying too much about backwards
> compatibility.
>
> These two linkers should share whatever code makes sense, but also
> shouldn’t try to share code that doesn’t make sense.  The split between the
> semantic model of sections vs atoms seems like a very natural one to me.
>
> One question is: does it make sense for these to live in the same lld
> subproject, or be split into two different subprojects?  I think the answer
> to that question is driven from whether there is shared code common between
> the two linkers that doesn’t make sense to sink down to the llvm subproject
> itself.
>
> What do you think?
>
> -Chris
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150528/be1c43e0/attachment.html>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - May 2015 - [LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

Reasonably Related Threads