thr3ads.net - llvm dev - [LLVMdev] [lld] Atom object model refactoring. [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Michael Spencer

2012-Jul-18 19:52 UTC

[LLVMdev] [lld] Atom object model refactoring.

I've run into some issues with the current atom object model that I
would like to fix.

The current 4 atoms are not expressive enough. We need to be able to
serialize a larger set of atoms, many of which are format specific.

The set of common atoms (shared between all formats) should be the set
that the resolver requires to work. SharedLibrary is not included in
this (by looking at the source code).

The driving use case for this for me is the Import Address Table in
PE/COFF. It is a section created by the writer that specifies external
symbols to import and then acts as the GOT/PLT at runtime. Building
this table requires extra information to be maintained in an efficient
format. It also needs to be an atom so that relocations can point to
it. However it does not have a well defined size or content until the
table is complete.

The File interface for atoms should be changed to File::iterator
begin(); File::iterator end(); where File::iterator is some type of
iterator over Atom.

As for serialization. Each atom can have its own serialize/unserialize
function for both the Native format and YAML.

This would also change ContentType to not contain so many format
specific values. It would also allow us to get rid of isThumb as a
DefinedAtom level attribute.

- Michael Spencer

Nick Kledzik

2012-Jul-18 21:34 UTC

head link

[LLVMdev] [lld] Atom object model refactoring.

On Jul 18, 2012, at 12:52 PM, Michael Spencer wrote:> I've run into some issues with the current atom object model that I
> would like to fix.
> 
> The current 4 atoms are not expressive enough. We need to be able to
> serialize a larger set of atoms, many of which are format specific.
> 
> The set of common atoms (shared between all formats) should be the set
> that the resolver requires to work. SharedLibrary is not included in
> this (by looking at the source code).
> 
> The driving use case for this for me is the Import Address Table in
> PE/COFF. It is a section created by the writer that specifies external
> symbols to import and then acts as the GOT/PLT at runtime. Building
> this table requires extra information to be maintained in an efficient
> format. It also needs to be an atom so that relocations can point to
> it. However it does not have a well defined size or content until the
> table is complete.Why is the IAT not just constructed in the PECOFF Writer?  Why does it need
to be an Atom?   What relocations need to point to it?  If they are relocations
created by the Writer, you are fine.  If you mean that other atoms may
have References (in)to the IAT, then that is what SharedLibraryAtoms are
for.  They are place holders that expand to something real in the Writer.

Mach-o has all kinds of crazy data structures that are constructed in the
Writer.
This is different than Darwin ld64 where the Writer actually created atoms for 
its data structures and feed them back to the resolver.  I wanted to avoid 
that insanity in lld.  

The Writer is handed a list of atoms from which to construct the executable.
It is free to create more atoms (private to the Writer) or just lay down data
structures - which ever is easier.  

> 
> The File interface for atoms should be changed to File::iterator
> begin(); File::iterator end(); where File::iterator is some type of
> iterator over Atom.
> 
> As for serialization. Each atom can have its own serialize/unserialize
> function for both the Native format and YAML.The four Atom kinds each have very different attributes and are used 
differently,  That is why I broke them out into separate lists.  

> This would also change ContentType to not contain so many format
> specific values. It would also allow us to get rid of isThumb as a
> DefinedAtom level attribute.I'm all for getting rid of isThumb(), but that seems orthogonal to your
issue.

-Nick

Nick Kledzik

2012-Jul-18 22:55 UTC

head link

[LLVMdev] [lld] Atom object model refactoring.

On Jul 18, 2012, at 3:41 PM, Clow, Marshall wrote:> On Jul 18, 2012, at 2:34 PM, Nick Kledzik wrote:
>> On Jul 18, 2012, at 12:52 PM, Michael Spencer wrote:
>>> I've run into some issues with the current atom object model
that I
>>> would like to fix.
>>> 
>>> The current 4 atoms are not expressive enough. We need to be able
to
>>> serialize a larger set of atoms, many of which are format specific.
>>> 
>>> The set of common atoms (shared between all formats) should be the
set
>>> that the resolver requires to work. SharedLibrary is not included
in
>>> this (by looking at the source code).
>>> 
>>> The driving use case for this for me is the Import Address Table in
>>> PE/COFF. It is a section created by the writer that specifies
external
>>> symbols to import and then acts as the GOT/PLT at runtime. Building
>>> this table requires extra information to be maintained in an
efficient
>>> format. It also needs to be an atom so that relocations can point
to
>>> it. However it does not have a well defined size or content until
the
>>> table is complete.
>> Why is the IAT not just constructed in the PECOFF Writer?  Why does it
need
>> to be an Atom?   What relocations need to point to it?  If they are
relocations
>> created by the Writer, you are fine.  If you mean that other atoms may
>> have References (in)to the IAT, then that is what SharedLibraryAtoms
are
>> for.  They are place holders that expand to something real in the
Writer.
>> 
>> Mach-o has all kinds of crazy data structures that are constructed in
the Writer.
>> This is different than Darwin ld64 where the Writer actually created
atoms for
>> its data structures and feed them back to the resolver.  I wanted to
avoid
>> that insanity in lld.  
>> 
>> The Writer is handed a list of atoms from which to construct the
executable.
>> It is free to create more atoms (private to the Writer) or just lay
down data
>> structures - which ever is easier.  
>> 
>>> The File interface for atoms should be changed to File::iterator
>>> begin(); File::iterator end(); where File::iterator is some type of
>>> iterator over Atom.
>>> 
>>> As for serialization. Each atom can have its own
serialize/unserialize
>>> function for both the Native format and YAML.
>> 
>> The four Atom kinds each have very different attributes and are used 
>> differently,  That is why I broke them out into separate lists.  
> 
> [ Just trying to understand here. ]
> 
> So, what I'm hearing is that there are four different kinds of Atoms.
> No more, no less - matching the enum in Atom.h.
> Is that correct?Stated that way, it makes the "four" seem arbitrary.  It makes more
sense once
you see that the four kinds are:

1) DefinedAtom
     95% of all atoms.  This is a chunk of code or data
2) UndefinedAtom
     This is a place holder in object files for a reference to some atom outside
the translation unit.
     During core linking it is usually replaced by (coalesced into) another
Atom.
3) SharedLibraryAtom
      If a required symbol name turns out to be defined in a dynamic shared
library (and not some
      object file).  A SharedLibraryAtom is the placeholder Atom used to
represent that fact.
      It is similar to an UndefinedAtom, but it also tracks information about
the associated shared library.
4) AbsoluteAtom
     This is for embedded support where some stuff is implemented in ROM at some
fixed address.  This
      atom has no content.  It is just an address that the Writer needs to fixup
any references to point to.
> 
> The readers generate a list of atoms from some object format.
> The linker does a bunch of graph stuff on the atoms.
> The writers get a list of (interconnected) atoms, and write an executable
from that.
> 
> Am I missing something?That is the high level summary.  


-Nick

Marshall Clow

2012-Jul-18 22:58 UTC

head link

[LLVMdev] [lld] Atom object model refactoring.

On Jul 18, 2012, at 3:55 PM, Nick Kledzik wrote:> On Jul 18, 2012, at 3:41 PM, Clow, Marshall wrote:
>> On Jul 18, 2012, at 2:34 PM, Nick Kledzik wrote:
>>> The four Atom kinds each have very different attributes and are
used
>>> differently,  That is why I broke them out into separate lists.  
>> 
>> [ Just trying to understand here. ]
>> 
>> So, what I'm hearing is that there are four different kinds of
Atoms.
>> No more, no less - matching the enum in Atom.h.
>> Is that correct?
> Stated that way, it makes the "four" seem arbitrary.  It makes
more sense once
> you see that the four kinds are:
> 
> 1) DefinedAtom
>     95% of all atoms.  This is a chunk of code or data
> 2) UndefinedAtom
>     This is a place holder in object files for a reference to some atom
outside the translation unit.
>     During core linking it is usually replaced by (coalesced into) another
Atom.
> 3) SharedLibraryAtom
>      If a required symbol name turns out to be defined in a dynamic shared
library (and not some
>      object file).  A SharedLibraryAtom is the placeholder Atom used to
represent that fact.
>      It is similar to an UndefinedAtom, but it also tracks information
about the associated shared library.
> 4) AbsoluteAtom
>     This is for embedded support where some stuff is implemented in ROM at
some fixed address.  This
>      atom has no content.  It is just an address that the Writer needs to
fixup any references to point to.
> 
>> 
>> The readers generate a list of atoms from some object format.
>> The linker does a bunch of graph stuff on the atoms.
>> The writers get a list of (interconnected) atoms, and write an
executable from that.
>> 
>> Am I missing something?
> That is the high level summary.  

Nick --

I've got no problem with the number "four" ;-)

I just want to make sure that I'm understanding what's going on - so I
can make intelligent commentary.
thanks for the elaboration.

Can I paste that description into design.rst?

-- Marshall

Marshall Clow     Idio Software   <mailto:mclow.lists at gmail.com>

A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly
moderated down to (-1, Flamebait).
        -- Yu Suzuki

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Jul 2012 - [LLVMdev] [lld] Atom object model refactoring.

[LLVMdev] [lld] Atom object model refactoring.

[LLVMdev] [lld] Atom object model refactoring.

[LLVMdev] [lld] Atom object model refactoring.

[LLVMdev] [lld] Atom object model refactoring.

Seemingly Similar Threads