thr3ads.net - llvm dev - [LLVMdev] LLD improvement plan [May 2015]

If this information is useful, please help other people find it:
Share via:

Rafael Espíndola

2015-May-28 04:06 UTC

[LLVMdev] LLD improvement plan

Replying to the thread, not just the email since I was on vacations.

First, note that there is a nomenclature issue. A section in ELF/COFF
is closer to an atom in MachO than a MachO section IMHO.

A rose by any other name would smell as sweet, but sure as hell
creates a lot of confusion :-)

On 4 May 2015 at 18:05, Chris Lattner <clattner at apple.com>
wrote:> On May 4, 2015, at 1:16 PM, Joerg Sonnenberger <joerg at
britannica.bec.de> wrote:
>> It has been said in this thread before, but I fail to see how the atom
>> model is an actual improvement over the fine grained section model. It
>> seems to be artifically restricted for no good reasons.
>
> Sections come with a huge amount of bloat and overhead that atoms do not.
No, they don't. Not on ELF for sure.

On ELF a section is just a entry into a table marking a region in the
file. The "huge amount of bloat" that people associate with sections
is actually just the extra space for the ultra large section names
".text._ZFoo....". Create multiple sections with the same name (I
implemented that) and the bloat goes away.

As has been pointed before, a section in ELF is just a better version
of what is called an Atom in lld: It is a chunk of the file that the
linker can move, but it also supports multiple symbols, which is handy
for things like making the C1 and C2 constructors share the same
address or how MSVC implement vtables+rtti.

Atoms being a distinct entity from sections (i.e., having non atomic
sections) is a necessity for MachO because it has more restrictive
sections (as Kevin was kind enough to explain).

Another way of looking at it (for understanding, I wouldn't use the
nomenclature in code) is that with this proposal lld will still be
atom based, we will just be extending atoms to support multiple
symbols. The logic for splitting sections into atoms would become

* ELF/COFF: one atom per section.
* MachO: One atom per global symbol.

And so MachO ends up with atoms that have only one symbol, but that is
just a special case.
>> This is another item that has been irritating me. While it is a very
>> laudable goal to not depend on linker scripts for the common case, not
>> having the functionality of fine grained output control is certainly a
>> problem. They are crucial for embedded developers and also at least
>> significant for anything near a system kernel.
>
> I’m not saying that the linker should eschew fine grained control, I’m
saying it should dump linker scripts (and replace them with something better). 
Are you going to argue that linker scripts are great, or that they are what we
would end up with if we weren’t driven by backwards compatibility goals?
I agree that this is a distinct issue. Linker scripts are a backward
compatibility pain. Directly using sections for ELF/COFF is *better*
than what is currently being done in lld.

As for organization, I agree with Rui's suggestion of 2 linkers in
one. One is ELF/COFF and uses sections, one is MachO and uses atoms.
Even with the split there is still enough common code that I don't
think having two repositories would help.

I don't agree that there is value in keeping the current atom on top
ELF/COFF. It just adds cost to two formats whose sections are already
flexible atoms. It also prevents optimizations like not even reading
duplicated comdats.

Last but not least, on the idea of a new object format:

Everyone that has worked on linkers or assemblers has a list of things
they don't like about the format that was being used (I do for sure).
It is entirely possible that if we get our thoughts together we can
build a better format.

Having said that, an object file format has a tremendous cost. Just
look at the pain that is maintaining support for mips' interpretation
of r_info. We have to be sure there is a genuine advantage to it
before adding a new object format to the world. To know that I think
we need to push the current formats to see how far they go.

As an analogy, imagine if people working on BFD had decided that ELF
linking was too slow or missing features and had decided to create a
new format that fit BFD better. That would have been really
unfortunate, because as gold showed the problem was not ELF, it was
the organization of BFD, but now we would probably be stuck supporting
4 formats in llvm and lld.

Once we have a linker (and MC) that is as good as it gets for ELF/COFF
and MachO we well be in a good position for discussing a new format.

Cheers,
Rafael

Sean Silva

2015-May-29 00:42 UTC

head link

[LLVMdev] LLD improvement plan

On Wed, May 27, 2015 at 9:06 PM, Rafael Espíndola <
rafael.espindola at gmail.com> wrote:
> Replying to the thread, not just the email since I was on vacations.
>
> First, note that there is a nomenclature issue. A section in ELF/COFF
> is closer to an atom in MachO than a MachO section IMHO.
>
> A rose by any other name would smell as sweet, but sure as hell
> creates a lot of confusion :-)
>
> On 4 May 2015 at 18:05, Chris Lattner <clattner at apple.com> wrote:
> > On May 4, 2015, at 1:16 PM, Joerg Sonnenberger <joerg at
britannica.bec.de>
> wrote:
> >> It has been said in this thread before, but I fail to see how the
atom
> >> model is an actual improvement over the fine grained section
model. It
> >> seems to be artifically restricted for no good reasons.
> >
> > Sections come with a huge amount of bloat and overhead that atoms do
not.
>
> No, they don't. Not on ELF for sure.
>
> On ELF a section is just a entry into a table marking a region in the
> file. The "huge amount of bloat" that people associate with
sections
> is actually just the extra space for the ultra large section names
> ".text._ZFoo....". Create multiple sections with the same name (I
> implemented that) and the bloat goes away.
>
> As has been pointed before, a section in ELF is just a better version
> of what is called an Atom in lld: It is a chunk of the file that the
> linker can move, but it also supports multiple symbols, which is handy
> for things like making the C1 and C2 constructors share the same
> address or how MSVC implement vtables+rtti.
>
> Atoms being a distinct entity from sections (i.e., having non atomic
> sections) is a necessity for MachO because it has more restrictive
> sections (as Kevin was kind enough to explain).
>
> Another way of looking at it (for understanding, I wouldn't use the
> nomenclature in code) is that with this proposal lld will still be
> atom based, we will just be extending atoms to support multiple
> symbols. The logic for splitting sections into atoms would become
>
> * ELF/COFF: one atom per section.
> * MachO: One atom per global symbol.
>
I guess, looking back at Nick's comment:

"The atom model is a good fit for the llvm compiler model for all
architectures.  There is a one-to-one mapping between llvm::GlobalObject
(e.g. function or global variable) and lld:DefinedAtom."

it seems that the primary issue on the ELF/COFF side is that currently the
LLVM backends are taking a finer-grained atomicity that is present inside
LLVM, and losing information by converting that to a coarser-grained
atomicity that is the typical "section" in ELF/COFF.
But doesn't -ffunction-sections -fdata-sections already fix this, basically?

On the Mach-O side, the issue seems to be that Mach-O's notion of section
carries more hard-coded meaning than e.g. ELF, so at the very least another
layer of subdivision below what Mach-O calls "section" would be needed
to
preserve this information; currently symbols are used as a bit of a hack as
this "sub-section" layer.

So the problem seems to be that the transport format between the compiler
and linker varies by platform, and each one has a different way to
represent things, some can't represent everything we want to do, apparently.

BUT it sounds like at least relocatable ELF semantics can, in principle,
represent everything that we can imagine an "atom-based file
format"/"native format" to want to represent. Just to play
devil's advocate
here, let's start out with the "native format" being relocatable
ELF - on
*all platforms*. Relocatable object files are just a transport format
between compiler and linker, after all; who cares what we use? If the
alternative is a completely new format, then bootstrapping from relocatable
ELF is strictly less churn/tooling cost.

People on the "atom side of the fence", what do you think? Is there
anything that we cannot achieve by saying "native"="relocatable
ELF"?

-- Sean Silva

>
> And so MachO ends up with atoms that have only one symbol, but that is
> just a special case.
>
> >> This is another item that has been irritating me. While it is a
very
> >> laudable goal to not depend on linker scripts for the common case,
not
> >> having the functionality of fine grained output control is
certainly a
> >> problem. They are crucial for embedded developers and also at
least
> >> significant for anything near a system kernel.
> >
> > I’m not saying that the linker should eschew fine grained control, I’m
> saying it should dump linker scripts (and replace them with something
> better).  Are you going to argue that linker scripts are great, or that
> they are what we would end up with if we weren’t driven by backwards
> compatibility goals?
>
> I agree that this is a distinct issue. Linker scripts are a backward
> compatibility pain. Directly using sections for ELF/COFF is *better*
> than what is currently being done in lld.
>
> As for organization, I agree with Rui's suggestion of 2 linkers in
> one. One is ELF/COFF and uses sections, one is MachO and uses atoms.
> Even with the split there is still enough common code that I don't
> think having two repositories would help.
>
> I don't agree that there is value in keeping the current atom on top
> ELF/COFF. It just adds cost to two formats whose sections are already
> flexible atoms. It also prevents optimizations like not even reading
> duplicated comdats.
>
> Last but not least, on the idea of a new object format:
>
> Everyone that has worked on linkers or assemblers has a list of things
> they don't like about the format that was being used (I do for sure).
> It is entirely possible that if we get our thoughts together we can
> build a better format.
>
> Having said that, an object file format has a tremendous cost. Just
> look at the pain that is maintaining support for mips' interpretation
> of r_info. We have to be sure there is a genuine advantage to it
> before adding a new object format to the world. To know that I think
> we need to push the current formats to see how far they go.
>
> As an analogy, imagine if people working on BFD had decided that ELF
> linking was too slow or missing features and had decided to create a
> new format that fit BFD better. That would have been really
> unfortunate, because as gold showed the problem was not ELF, it was
> the organization of BFD, but now we would probably be stuck supporting
> 4 formats in llvm and lld.
>
> Once we have a linker (and MC) that is as good as it gets for ELF/COFF
> and MachO we well be in a good position for discussing a new format.
>
> Cheers,
> Rafael
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150528/26aa5320/attachment.html>

Nick Kledzik

2015-May-29 01:25 UTC

head link

[LLVMdev] LLD improvement plan

On May 28, 2015, at 5:42 PM, Sean Silva <chisophugis at gmail.com> wrote:
> I guess, looking back at Nick's comment:
> 
> "The atom model is a good fit for the llvm compiler model for all
architectures.  There is a one-to-one mapping between llvm::GlobalObject (e.g.
function or global variable) and lld:DefinedAtom."
> 
> it seems that the primary issue on the ELF/COFF side is that currently the
LLVM backends are taking a finer-grained atomicity that is present inside LLVM,
and losing information by converting that to a coarser-grained atomicity that is
the typical "section" in ELF/COFF.
> But doesn't -ffunction-sections -fdata-sections already fix this,
basically?
> 
> On the Mach-O side, the issue seems to be that Mach-O's notion of
section carries more hard-coded meaning than e.g. ELF, so at the very least
another layer of subdivision below what Mach-O calls "section" would
be needed to preserve this information; currently symbols are used as a bit of a
hack as this "sub-section" layer.I’m not sure what you mean here.
> 
> So the problem seems to be that the transport format between the compiler
and linker varies by platform, and each one has a different way to represent
things, some can't represent everything we want to do, apparently.Yes!

> BUT it sounds like at least relocatable ELF semantics can, in principle,
represent everything that we can imagine an "atom-based file
format"/"native format" to want to represent. Just to play
devil's advocate here, let's start out with the "native
format" being relocatable ELF - on *all platforms*. Relocatable object
files are just a transport format between compiler and linker, after all; who
cares what we use? If the alternative is a completely new format, then
bootstrapping from relocatable ELF is strictly less churn/tooling cost.
> 
> People on the "atom side of the fence", what do you think? Is
there anything that we cannot achieve by saying
"native"="relocatable ELF"?
1) Turns out .o files are written once but read many times by the linker. 
Therefore, the design goal of .o files should be that they are as fast to
read/parse in the linker as possible.  Slowing down the compiler to make a .o
file that is faster for the linker to read is a good trade off.  This is the
motivation for the native format - not that it is a universal format.

2) I think the ELF camp still thinks that linkers are “dumb”.  That they just
collate .o files into executable files.  The darwin linker does a lot of
processing/optimizing the content (e.g. Objective-C optimizing, dead stripping,
function/data re-ordering).  This is why atom level granularity is needed.

For darwin, ELF based .o files is not interesting.  It won’t be faster, and it
will take a bunch of effort to figure out how to encode all the mach-o info into
ELF.  We’d rather wait for a new native format.

-Nick

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150528/fb195718/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - May 2015 - [LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

Maybe Matching Threads