I've run into some issues with the current atom object model that I would like to fix. The current 4 atoms are not expressive enough. We need to be able to serialize a larger set of atoms, many of which are format specific. The set of common atoms (shared between all formats) should be the set that the resolver requires to work. SharedLibrary is not included in this (by looking at the source code). The driving use case for this for me is the Import Address Table in PE/COFF. It is a section created by the writer that specifies external symbols to import and then acts as the GOT/PLT at runtime. Building this table requires extra information to be maintained in an efficient format. It also needs to be an atom so that relocations can point to it. However it does not have a well defined size or content until the table is complete. The File interface for atoms should be changed to File::iterator begin(); File::iterator end(); where File::iterator is some type of iterator over Atom. As for serialization. Each atom can have its own serialize/unserialize function for both the Native format and YAML. This would also change ContentType to not contain so many format specific values. It would also allow us to get rid of isThumb as a DefinedAtom level attribute. - Michael Spencer
On Jul 18, 2012, at 12:52 PM, Michael Spencer wrote:> I've run into some issues with the current atom object model that I > would like to fix. > > The current 4 atoms are not expressive enough. We need to be able to > serialize a larger set of atoms, many of which are format specific. > > The set of common atoms (shared between all formats) should be the set > that the resolver requires to work. SharedLibrary is not included in > this (by looking at the source code). > > The driving use case for this for me is the Import Address Table in > PE/COFF. It is a section created by the writer that specifies external > symbols to import and then acts as the GOT/PLT at runtime. Building > this table requires extra information to be maintained in an efficient > format. It also needs to be an atom so that relocations can point to > it. However it does not have a well defined size or content until the > table is complete.Why is the IAT not just constructed in the PECOFF Writer? Why does it need to be an Atom? What relocations need to point to it? If they are relocations created by the Writer, you are fine. If you mean that other atoms may have References (in)to the IAT, then that is what SharedLibraryAtoms are for. They are place holders that expand to something real in the Writer. Mach-o has all kinds of crazy data structures that are constructed in the Writer. This is different than Darwin ld64 where the Writer actually created atoms for its data structures and feed them back to the resolver. I wanted to avoid that insanity in lld. The Writer is handed a list of atoms from which to construct the executable. It is free to create more atoms (private to the Writer) or just lay down data structures - which ever is easier.> > The File interface for atoms should be changed to File::iterator > begin(); File::iterator end(); where File::iterator is some type of > iterator over Atom. > > As for serialization. Each atom can have its own serialize/unserialize > function for both the Native format and YAML.The four Atom kinds each have very different attributes and are used differently, That is why I broke them out into separate lists.> This would also change ContentType to not contain so many format > specific values. It would also allow us to get rid of isThumb as a > DefinedAtom level attribute.I'm all for getting rid of isThumb(), but that seems orthogonal to your issue. -Nick
On Jul 18, 2012, at 3:41 PM, Clow, Marshall wrote:> On Jul 18, 2012, at 2:34 PM, Nick Kledzik wrote: >> On Jul 18, 2012, at 12:52 PM, Michael Spencer wrote: >>> I've run into some issues with the current atom object model that I >>> would like to fix. >>> >>> The current 4 atoms are not expressive enough. We need to be able to >>> serialize a larger set of atoms, many of which are format specific. >>> >>> The set of common atoms (shared between all formats) should be the set >>> that the resolver requires to work. SharedLibrary is not included in >>> this (by looking at the source code). >>> >>> The driving use case for this for me is the Import Address Table in >>> PE/COFF. It is a section created by the writer that specifies external >>> symbols to import and then acts as the GOT/PLT at runtime. Building >>> this table requires extra information to be maintained in an efficient >>> format. It also needs to be an atom so that relocations can point to >>> it. However it does not have a well defined size or content until the >>> table is complete. >> Why is the IAT not just constructed in the PECOFF Writer? Why does it need >> to be an Atom? What relocations need to point to it? If they are relocations >> created by the Writer, you are fine. If you mean that other atoms may >> have References (in)to the IAT, then that is what SharedLibraryAtoms are >> for. They are place holders that expand to something real in the Writer. >> >> Mach-o has all kinds of crazy data structures that are constructed in the Writer. >> This is different than Darwin ld64 where the Writer actually created atoms for >> its data structures and feed them back to the resolver. I wanted to avoid >> that insanity in lld. >> >> The Writer is handed a list of atoms from which to construct the executable. >> It is free to create more atoms (private to the Writer) or just lay down data >> structures - which ever is easier. >> >>> The File interface for atoms should be changed to File::iterator >>> begin(); File::iterator end(); where File::iterator is some type of >>> iterator over Atom. >>> >>> As for serialization. Each atom can have its own serialize/unserialize >>> function for both the Native format and YAML. >> >> The four Atom kinds each have very different attributes and are used >> differently, That is why I broke them out into separate lists. > > [ Just trying to understand here. ] > > So, what I'm hearing is that there are four different kinds of Atoms. > No more, no less - matching the enum in Atom.h. > Is that correct?Stated that way, it makes the "four" seem arbitrary. It makes more sense once you see that the four kinds are: 1) DefinedAtom 95% of all atoms. This is a chunk of code or data 2) UndefinedAtom This is a place holder in object files for a reference to some atom outside the translation unit. During core linking it is usually replaced by (coalesced into) another Atom. 3) SharedLibraryAtom If a required symbol name turns out to be defined in a dynamic shared library (and not some object file). A SharedLibraryAtom is the placeholder Atom used to represent that fact. It is similar to an UndefinedAtom, but it also tracks information about the associated shared library. 4) AbsoluteAtom This is for embedded support where some stuff is implemented in ROM at some fixed address. This atom has no content. It is just an address that the Writer needs to fixup any references to point to.> > The readers generate a list of atoms from some object format. > The linker does a bunch of graph stuff on the atoms. > The writers get a list of (interconnected) atoms, and write an executable from that. > > Am I missing something?That is the high level summary. -Nick
On Jul 18, 2012, at 3:55 PM, Nick Kledzik wrote:> On Jul 18, 2012, at 3:41 PM, Clow, Marshall wrote: >> On Jul 18, 2012, at 2:34 PM, Nick Kledzik wrote: >>> The four Atom kinds each have very different attributes and are used >>> differently, That is why I broke them out into separate lists. >> >> [ Just trying to understand here. ] >> >> So, what I'm hearing is that there are four different kinds of Atoms. >> No more, no less - matching the enum in Atom.h. >> Is that correct? > Stated that way, it makes the "four" seem arbitrary. It makes more sense once > you see that the four kinds are: > > 1) DefinedAtom > 95% of all atoms. This is a chunk of code or data > 2) UndefinedAtom > This is a place holder in object files for a reference to some atom outside the translation unit. > During core linking it is usually replaced by (coalesced into) another Atom. > 3) SharedLibraryAtom > If a required symbol name turns out to be defined in a dynamic shared library (and not some > object file). A SharedLibraryAtom is the placeholder Atom used to represent that fact. > It is similar to an UndefinedAtom, but it also tracks information about the associated shared library. > 4) AbsoluteAtom > This is for embedded support where some stuff is implemented in ROM at some fixed address. This > atom has no content. It is just an address that the Writer needs to fixup any references to point to. > >> >> The readers generate a list of atoms from some object format. >> The linker does a bunch of graph stuff on the atoms. >> The writers get a list of (interconnected) atoms, and write an executable from that. >> >> Am I missing something? > That is the high level summary.Nick -- I've got no problem with the number "four" ;-) I just want to make sure that I'm understanding what's going on - so I can make intelligent commentary. thanks for the elaboration. Can I paste that description into design.rst? -- Marshall Marshall Clow Idio Software <mailto:mclow.lists at gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki