thr3ads.net - llvm dev - [LLVMdev] LLD improvement plan [May 2015]

If this information is useful, please help other people find it:
Share via:

David Blaikie

2015-May-11 18:21 UTC

[LLVMdev] LLD improvement plan

On Mon, May 11, 2015 at 11:13 AM, Rui Ueyama <ruiu at google.com> wrote:
> If you attach two ore more symbols along with offsets to a chunk of data,
> it would be a pretty similar to a section. That means that if you want to
> do something on the atom model, now you have to treat the atoms like
> sections.
>
What do you lose/pay by having to treat the atoms like sections?

> I looks like a bad mix of the two.
>
> On Mon, May 11, 2015 at 10:56 AM, James Y Knight <jyknight at
google.com>
> wrote:
>
>> Nobody in this long thread appears to have yet explained why it's a
bad
>> idea to allow atomic fragments of code/data (whatever you want to call
>> them: atoms, sections, who cares) to have more than one global symbol
>> attached to them in LLD's internal representation.
>>
>> That seems like it'd provide the flexibility needed for ELF without
>> hurting MachO. If that change'd allow you to avoid splitting the
linker
>> into two-codebases-in-one, isn't that preferable?
>>
>>
>> On Thu, May 7, 2015 at 9:38 AM, Joerg Sonnenberger <
>> joerg at britannica.bec.de> wrote:
>>
>>> On Wed, May 06, 2015 at 09:28:54PM -0500, Shankar Easwaran wrote:
>>> > The atom model allowed lld to have a single intermediate
>>> > representation for all the formats ELF/COFF/Mach-O. The native
model
>>> > allowed the intermediate representation to be serialized to
disk
>>> > too. If the intermediate representations data structures are
made
>>> > available to scripting languages most of all linker script
script
>>> > layout can be implemented by the end user. A new language also
can
>>> > be developed as most of the users need it and it can work on
this
>>> > intermediate representation.
>>> >
>>> > The atom model also simplified a lot of usecases like garbage
>>> > collection and having the resolve to deal just with atoms. The
>>> > section model would sound simple from the outside but it it
has its
>>> > own challenges like separating the symbol information from
section
>>> > information.
>>>
>>> I'm sorry, but I don't get why any of this requires an atom
based
>>> representation. Saying that a single intermediate representation
for
>>> ELF/COFF on one hand and Mach-O on the other is ironic given the
already
>>> mentioned hacks on various layers. Garbage collection doesn't
become
>>> more expensive when attaching more than one symbol to each
code/data
>>> fragment. Symbol resolution doesn't change when attaching more
than one
>>> symbol to each code/data fragment. The list goes on. The single
natural
>>> advantage is that you can use a single pointer to the canonical
symbol
>>> from a code/data fragment and don't have to use a list/array.
Given the
>>> necessary and expensive hacks for splitting sections into (pseudo)
>>> atoms, that doesn't feel like a win. So once again, what actual
>>> advantages for ELF or COFF have been created by the atom model?
Mach-O
>>> hardly counts as it doesn't allow the flexibility of the
section model
>>> as has been discussed before.
>>>
>>> Joerg
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150511/0208f4a2/attachment.html>

Rui Ueyama

2015-May-11 19:01 UTC

head link

[LLVMdev] LLD improvement plan

On Mon, May 11, 2015 at 11:21 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
> On Mon, May 11, 2015 at 11:13 AM, Rui Ueyama <ruiu at google.com>
wrote:
>
>> If you attach two ore more symbols along with offsets to a chunk of
data,
>> it would be a pretty similar to a section. That means that if you want
to
>> do something on the atom model, now you have to treat the atoms like
>> sections.
>>
>
> What do you lose/pay by having to treat the atoms like sections?
>
I can think of a few, maybe more.

An atom model with multiple names is not as simple as before.

We still need to read all relocation tables to complete a graph as they
form edges, even for duplicate COMDAT sections.

It still can't model some section-linker features, such as "select
largest"
COMDAT sections, because the new atom is still different from the section.
(This is not an issue if we really treat atoms like sections, but that
means in turn we would be going to create a Mach-O linker based on the
section model.)

>
>> I looks like a bad mix of the two.
>>
>> On Mon, May 11, 2015 at 10:56 AM, James Y Knight <jyknight at
google.com>
>> wrote:
>>
>>> Nobody in this long thread appears to have yet explained why
it's a bad
>>> idea to allow atomic fragments of code/data (whatever you want to
call
>>> them: atoms, sections, who cares) to have more than one global
symbol
>>> attached to them in LLD's internal representation.
>>>
>>> That seems like it'd provide the flexibility needed for ELF
without
>>> hurting MachO. If that change'd allow you to avoid splitting
the linker
>>> into two-codebases-in-one, isn't that preferable?
>>>
>>>
>>> On Thu, May 7, 2015 at 9:38 AM, Joerg Sonnenberger <
>>> joerg at britannica.bec.de> wrote:
>>>
>>>> On Wed, May 06, 2015 at 09:28:54PM -0500, Shankar Easwaran
wrote:
>>>> > The atom model allowed lld to have a single intermediate
>>>> > representation for all the formats ELF/COFF/Mach-O. The
native model
>>>> > allowed the intermediate representation to be serialized
to disk
>>>> > too. If the intermediate representations data structures
are made
>>>> > available to scripting languages most of all linker script
script
>>>> > layout can be implemented by the end user. A new language
also can
>>>> > be developed as most of the users need it and it can work
on this
>>>> > intermediate representation.
>>>> >
>>>> > The atom model also simplified a lot of usecases like
garbage
>>>> > collection and having the resolve to deal just with atoms.
The
>>>> > section model would sound simple from the outside but it
it has its
>>>> > own challenges like separating the symbol information from
section
>>>> > information.
>>>>
>>>> I'm sorry, but I don't get why any of this requires an
atom based
>>>> representation. Saying that a single intermediate
representation for
>>>> ELF/COFF on one hand and Mach-O on the other is ironic given
the already
>>>> mentioned hacks on various layers. Garbage collection
doesn't become
>>>> more expensive when attaching more than one symbol to each
code/data
>>>> fragment. Symbol resolution doesn't change when attaching
more than one
>>>> symbol to each code/data fragment. The list goes on. The single
natural
>>>> advantage is that you can use a single pointer to the canonical
symbol
>>>> from a code/data fragment and don't have to use a
list/array. Given the
>>>> necessary and expensive hacks for splitting sections into
(pseudo)
>>>> atoms, that doesn't feel like a win. So once again, what
actual
>>>> advantages for ELF or COFF have been created by the atom model?
Mach-O
>>>> hardly counts as it doesn't allow the flexibility of the
section model
>>>> as has been discussed before.
>>>>
>>>> Joerg
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150511/42bccb71/attachment.html>

Kevin Enderby

2015-May-11 21:05 UTC

head link

[LLVMdev] LLD improvement plan

Hello,

As the guy that came up with the “temporary” Mach-O hack to use
.subsections_via_symbols a long while back I wanted throw out my thinking,
though these are old thoughts before the atom concept existed.

Back in the day we needed some way to do dead code stripping at the static link
editor level as code coming from the Mac OS 9 world would simply not link
without this feature moving to what was to become Mac OS X (much code that was
not used that had references to undefined symbols).  So we needed a quick hack
and an static link editor that implemented dead code stripping asap.

I did not feel it was the linker’s place to divide up sections into smaller
parts as this really should be done by the compiler that understood the language
semantics it was translating.  As things like blocks of code with multiple entry
points (what Fortran does), C++ constructs as explicit vs implicit
Instantiation, and which blocks of code, data, exception info, and debug info to
make up a group (for things like comdat).

But I felt to do this right we would have to extend the object file format and
the assembly language to add support for subsections.  Then let the compiler
group things in to the smallest indivisible part of the language for
translation. Then symbols would be part of a specific subsection and subsections
could have zero or more symbols.  Having zero symbols for some blocks and them
only having references to the assembly temporary symbols that got removed I felt
would be cleaner that carrying along fake symbols that must be there to divide
up the section later.

The relocation entries would be per subsection and the subsection would carry
the alignment directly (so it would not have to be inferred by the hack of
dividing up the section).  The design would allow subsections to types (regular,
literals, etc), and signature symbols (weak and non-weak) for things like
comdat.

But to do this it would have had us make extensive changes to our BSD derived
Mach-O object file format that we started with.  As we had a symbol struct with
only an 8 bit section number and we had another hack back in the 32-bit days to
do scattered relocation entries based on 24-bits of address reference to allow
references for pic code like RefSymbol - PicBaseSym.

All this could have been changed but it was deemed too aggressive and the atom
model of the ld64 design prevailed as it exists today in the Mac OS X linker.

But today with clang one could see the compiler being the one deciding the
smallest unit based on the target.  In the Mac OS X and iOS cases that would be
the atoms, and for other targets that would be sections.  So maybe by stepping
back up the compiler chain the LLD improvement plan could be better.

Some old thoughts,
Kev
> On May 11, 2015, at 12:01 PM, Rui Ueyama <ruiu at google.com> wrote:
> 
> On Mon, May 11, 2015 at 11:21 AM, David Blaikie <dblaikie at gmail.com
<mailto:dblaikie at gmail.com>> wrote:
> 
> 
> On Mon, May 11, 2015 at 11:13 AM, Rui Ueyama <ruiu at google.com
<mailto:ruiu at google.com>> wrote:
> If you attach two ore more symbols along with offsets to a chunk of data,
it would be a pretty similar to a section. That means that if you want to do
something on the atom model, now you have to treat the atoms like sections.
> 
> What do you lose/pay by having to treat the atoms like sections?
> 
> I can think of a few, maybe more.
> 
> An atom model with multiple names is not as simple as before.
> 
> We still need to read all relocation tables to complete a graph as they
form edges, even for duplicate COMDAT sections.
> 
> It still can't model some section-linker features, such as "select
largest" COMDAT sections, because the new atom is still different from the
section. (This is not an issue if we really treat atoms like sections, but that
means in turn we would be going to create a Mach-O linker based on the section
model.)
> 
>  
> I looks like a bad mix of the two.
> 
> On Mon, May 11, 2015 at 10:56 AM, James Y Knight <jyknight at google.com
<mailto:jyknight at google.com>> wrote:
> Nobody in this long thread appears to have yet explained why it's a bad
idea to allow atomic fragments of code/data (whatever you want to call them:
atoms, sections, who cares) to have more than one global symbol attached to them
in LLD's internal representation.
> 
> That seems like it'd provide the flexibility needed for ELF without
hurting MachO. If that change'd allow you to avoid splitting the linker into
two-codebases-in-one, isn't that preferable?
> 
> 
> On Thu, May 7, 2015 at 9:38 AM, Joerg Sonnenberger <joerg at
britannica.bec.de <mailto:joerg at britannica.bec.de>> wrote:
> On Wed, May 06, 2015 at 09:28:54PM -0500, Shankar Easwaran wrote:
> > The atom model allowed lld to have a single intermediate
> > representation for all the formats ELF/COFF/Mach-O. The native model
> > allowed the intermediate representation to be serialized to disk
> > too. If the intermediate representations data structures are made
> > available to scripting languages most of all linker script script
> > layout can be implemented by the end user. A new language also can
> > be developed as most of the users need it and it can work on this
> > intermediate representation.
> >
> > The atom model also simplified a lot of usecases like garbage
> > collection and having the resolve to deal just with atoms. The
> > section model would sound simple from the outside but it it has its
> > own challenges like separating the symbol information from section
> > information.
> 
> I'm sorry, but I don't get why any of this requires an atom based
> representation. Saying that a single intermediate representation for
> ELF/COFF on one hand and Mach-O on the other is ironic given the already
> mentioned hacks on various layers. Garbage collection doesn't become
> more expensive when attaching more than one symbol to each code/data
> fragment. Symbol resolution doesn't change when attaching more than one
> symbol to each code/data fragment. The list goes on. The single natural
> advantage is that you can use a single pointer to the canonical symbol
> from a code/data fragment and don't have to use a list/array. Given the
> necessary and expensive hacks for splitting sections into (pseudo)
> atoms, that doesn't feel like a win. So once again, what actual
> advantages for ELF or COFF have been created by the atom model? Mach-O
> hardly counts as it doesn't allow the flexibility of the section model
> as has been discussed before.
> 
> Joerg
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150511/92b20057/attachment.html>

llvm dev - May 2015 - [LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan

[LLVMdev] LLD improvement plan