Hi, There are a lot of advantages to keep on improving the atom model and working on that model. The atom model allowed lld to have a single intermediate representation for all the formats ELF/COFF/Mach-O. The native model allowed the intermediate representation to be serialized to disk too. If the intermediate representations data structures are made available to scripting languages most of all linker script script layout can be implemented by the end user. A new language also can be developed as most of the users need it and it can work on this intermediate representation. The atom model also simplified a lot of usecases like garbage collection and having the resolve to deal just with atoms. The section model would sound simple from the outside but it it has its own challenges like separating the symbol information from section information. The atom model also simplifies testing as there is one unique/nice way to test the core linker independent of the format. In addition to testing, there are tools that are designed to convert ELF to COFF or viceversa, which makes lld to support these usecases by design. Most of all embedded users want to reduce the final image size by compiling code using -ffunction-sections and -fdata-sections, which makes the atom model directly model it. Thanks to Espindola for adding support for -fno-unique-section-names which makes lld and the atom model more useful. lld has already proven that it can link most of our llvm tools and self host with reasonable performance, I dont see why we dont want to continue with the Atom model. Atom model also eases up dealing with LTO in general. In summary, I would like to continue the ELF ports using the Atom model. _If a section model is being chosen to model flavors lets not mixing it up with Atom model as I can see there would be very less code sharing._ Shankar Easwaran On 5/4/2015 2:52 PM, Chris Lattner wrote:> On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote: >> Proposal >> Re-architect the linker based on the section model where it’s appropriate. >> Stop simulating different linker semantics using the Unix model. Instead, directly implement the native behavior. > Preface: I have never personally contributed code to LLD, so don’t take anything I’m about to say too seriously. This is not a mandate or anything, just an observation/idea. > > > I think that there is an alternative solution to these exact same problems. What you’ve identified here is that there are two camps of people working on LLD, and they have conflicting goals: > > - Camp A: LLD is infrastructure for the next generation of awesome linking and toolchain features, it should take advantage of how compilers work to offer new features, performance, etc without deep concern for compatibility. > > - Camp B: LLD is a drop in replacement system linker (notably for COFF and ELF systems), which is best of breed and with no compromises w.r.t. that goal. > > > I think the problem here is that these lead to natural and inescapable tensions, and Alex summarized how Camp B has been steering LLD away from what Camp A people want. This isn’t bad in and of itself, because what Camp B wants is clearly and unarguably good for LLVM. However, it is also not sufficient, and while innovation in the linker space (e.g. a new “native” object file format generated directly from compiler structures) may or may not actually “work” or be “worth it”, we won’t know unless we try, and that won’t fulfill its promise if there are compromises to Camp B. > > So here’s my counterproposal: two different linkers. > > Lets stop thinking about lld as one linker, and instead think of it is two different ones. We’ll build a Camp B linker which is the best of breed section based linker. It will support linker scripts and do everything better than any existing section based linker. The first step of this is to do what Rui proposes and rip atoms out of the model. > > We will also build a no-holds-barred awesome atom based linker that takes advantage of everything it can from LLVM’s architecture to enable innovative new tools without worrying too much about backwards compatibility. > > These two linkers should share whatever code makes sense, but also shouldn’t try to share code that doesn’t make sense. The split between the semantic model of sections vs atoms seems like a very natural one to me. > > One question is: does it make sense for these to live in the same lld subproject, or be split into two different subprojects? I think the answer to that question is driven from whether there is shared code common between the two linkers that doesn’t make sense to sink down to the llvm subproject itself. > > What do you think? > > -Chris > >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150506/519a74a2/attachment.html>
On Wed, May 06, 2015 at 09:28:54PM -0500, Shankar Easwaran wrote:> The atom model allowed lld to have a single intermediate > representation for all the formats ELF/COFF/Mach-O. The native model > allowed the intermediate representation to be serialized to disk > too. If the intermediate representations data structures are made > available to scripting languages most of all linker script script > layout can be implemented by the end user. A new language also can > be developed as most of the users need it and it can work on this > intermediate representation. > > The atom model also simplified a lot of usecases like garbage > collection and having the resolve to deal just with atoms. The > section model would sound simple from the outside but it it has its > own challenges like separating the symbol information from section > information.I'm sorry, but I don't get why any of this requires an atom based representation. Saying that a single intermediate representation for ELF/COFF on one hand and Mach-O on the other is ironic given the already mentioned hacks on various layers. Garbage collection doesn't become more expensive when attaching more than one symbol to each code/data fragment. Symbol resolution doesn't change when attaching more than one symbol to each code/data fragment. The list goes on. The single natural advantage is that you can use a single pointer to the canonical symbol from a code/data fragment and don't have to use a list/array. Given the necessary and expensive hacks for splitting sections into (pseudo) atoms, that doesn't feel like a win. So once again, what actual advantages for ELF or COFF have been created by the atom model? Mach-O hardly counts as it doesn't allow the flexibility of the section model as has been discussed before. Joerg
On 5/7/2015 8:38 AM, Joerg Sonnenberger wrote:> On Wed, May 06, 2015 at 09:28:54PM -0500, Shankar Easwaran wrote: >> The atom model allowed lld to have a single intermediate >> representation for all the formats ELF/COFF/Mach-O. The native model >> allowed the intermediate representation to be serialized to disk >> too. If the intermediate representations data structures are made >> available to scripting languages most of all linker script script >> layout can be implemented by the end user. A new language also can >> be developed as most of the users need it and it can work on this >> intermediate representation. >> >> The atom model also simplified a lot of usecases like garbage >> collection and having the resolve to deal just with atoms. The >> section model would sound simple from the outside but it it has its >> own challenges like separating the symbol information from section >> information. > I'm sorry, but I don't get why any of this requires an atom based > representation. Saying that a single intermediate representation for > ELF/COFF on one hand and Mach-O on the other is ironic given the already > mentioned hacks on various layers. Garbage collection doesn't become > more expensive when attaching more than one symbol to each code/data > fragment. Symbol resolution doesn't change when attaching more than one > symbol to each code/data fragment. The list goes on. The single natural > advantage is that you can use a single pointer to the canonical symbol > from a code/data fragment and don't have to use a list/array. Given the > necessary and expensive hacks for splitting sections into (pseudo) > atoms, that doesn't feel like a win. So once again, what actual > advantages for ELF or COFF have been created by the atom model? Mach-O > hardly counts as it doesn't allow the flexibility of the section model > as has been discussed before. >The atom model is optimized when you compile the code with -ffunction-sections and -fdata-sections. Once targets start having -fno-unique-section-names as the default the atom model looks more promising. Everyone likes to have the image size small, and making -ffunction-sections/-fdata-sections (or) -fno-unique-section-names the default make sense and the atom model design directly has a relation to it. In fact it simplifies the linker to not have extra data structures IMO. Shankar Easwaran -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation
All these points were initially advertised as advantages of the atom model, but eventually it's proved that they are not as good as we originally expected for the existing file formats at least. I became more confident on this as I work not only for the PE/COFF but also for ELF. What Joerg wrote is correct. We (including you) have spent so much time to discuss how to model various things in the existing file formats in the atom model, which sometimes resulted in a very complex architecture, that would have otherwise been naturally and effortlessly modeled. We've wrote large amount of code to deal with the impedance mismatch between the atom model and the model that the actual file formats expect. I think it's now obvious that that was not a good trade-off if you prefer simple and clean design. On Wed, May 6, 2015 at 7:28 PM, Shankar Easwaran <shankare at codeaurora.org> wrote:> Hi, > > There are a lot of advantages to keep on improving the atom model and > working on that model. > > The atom model allowed lld to have a single intermediate representation > for all the formats ELF/COFF/Mach-O. The native model allowed the > intermediate representation to be serialized to disk too. If the > intermediate representations data structures are made available to > scripting languages most of all linker script script layout can be > implemented by the end user. A new language also can be developed as most > of the users need it and it can work on this intermediate representation. > > The atom model also simplified a lot of usecases like garbage collection > and having the resolve to deal just with atoms. The section model would > sound simple from the outside but it it has its own challenges like > separating the symbol information from section information. > > The atom model also simplifies testing as there is one unique/nice way to > test the core linker independent of the format. > > In addition to testing, there are tools that are designed to convert ELF > to COFF or viceversa, which makes lld to support these usecases by design. > > Most of all embedded users want to reduce the final image size by > compiling code using -ffunction-sections and -fdata-sections, which makes > the atom model directly model it. Thanks to Espindola for adding support > for -fno-unique-section-names which makes lld and the atom model more > useful. > > lld has already proven that it can link most of our llvm tools and self > host with reasonable performance, I dont see why we dont want to continue > with the Atom model. > > Atom model also eases up dealing with LTO in general. > > In summary, I would like to continue the ELF ports using the Atom model. > > * If a section model is being chosen to model flavors lets not mixing it > up with Atom model as I can see there would be very less code sharing.* > > Shankar Easwaran > > > On 5/4/2015 2:52 PM, Chris Lattner wrote: > > On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> <ruiu at google.com> wrote: > > Proposal > Re-architect the linker based on the section model where it’s appropriate. > Stop simulating different linker semantics using the Unix model. Instead, directly implement the native behavior. > > Preface: I have never personally contributed code to LLD, so don’t take anything I’m about to say too seriously. This is not a mandate or anything, just an observation/idea. > > > I think that there is an alternative solution to these exact same problems. What you’ve identified here is that there are two camps of people working on LLD, and they have conflicting goals: > > - Camp A: LLD is infrastructure for the next generation of awesome linking and toolchain features, it should take advantage of how compilers work to offer new features, performance, etc without deep concern for compatibility. > > - Camp B: LLD is a drop in replacement system linker (notably for COFF and ELF systems), which is best of breed and with no compromises w.r.t. that goal. > > > I think the problem here is that these lead to natural and inescapable tensions, and Alex summarized how Camp B has been steering LLD away from what Camp A people want. This isn’t bad in and of itself, because what Camp B wants is clearly and unarguably good for LLVM. However, it is also not sufficient, and while innovation in the linker space (e.g. a new “native” object file format generated directly from compiler structures) may or may not actually “work” or be “worth it”, we won’t know unless we try, and that won’t fulfill its promise if there are compromises to Camp B. > > So here’s my counterproposal: two different linkers. > > Lets stop thinking about lld as one linker, and instead think of it is two different ones. We’ll build a Camp B linker which is the best of breed section based linker. It will support linker scripts and do everything better than any existing section based linker. The first step of this is to do what Rui proposes and rip atoms out of the model. > > We will also build a no-holds-barred awesome atom based linker that takes advantage of everything it can from LLVM’s architecture to enable innovative new tools without worrying too much about backwards compatibility. > > These two linkers should share whatever code makes sense, but also shouldn’t try to share code that doesn’t make sense. The split between the semantic model of sections vs atoms seems like a very natural one to me. > > One question is: does it make sense for these to live in the same lld subproject, or be split into two different subprojects? I think the answer to that question is driven from whether there is shared code common between the two linkers that doesn’t make sense to sink down to the llvm subproject itself. > > What do you think? > > -Chris > > > > > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150507/7536b75c/attachment.html>
Nobody in this long thread appears to have yet explained why it's a bad idea to allow atomic fragments of code/data (whatever you want to call them: atoms, sections, who cares) to have more than one global symbol attached to them in LLD's internal representation. That seems like it'd provide the flexibility needed for ELF without hurting MachO. If that change'd allow you to avoid splitting the linker into two-codebases-in-one, isn't that preferable? On Thu, May 7, 2015 at 9:38 AM, Joerg Sonnenberger <joerg at britannica.bec.de> wrote:> On Wed, May 06, 2015 at 09:28:54PM -0500, Shankar Easwaran wrote: > > The atom model allowed lld to have a single intermediate > > representation for all the formats ELF/COFF/Mach-O. The native model > > allowed the intermediate representation to be serialized to disk > > too. If the intermediate representations data structures are made > > available to scripting languages most of all linker script script > > layout can be implemented by the end user. A new language also can > > be developed as most of the users need it and it can work on this > > intermediate representation. > > > > The atom model also simplified a lot of usecases like garbage > > collection and having the resolve to deal just with atoms. The > > section model would sound simple from the outside but it it has its > > own challenges like separating the symbol information from section > > information. > > I'm sorry, but I don't get why any of this requires an atom based > representation. Saying that a single intermediate representation for > ELF/COFF on one hand and Mach-O on the other is ironic given the already > mentioned hacks on various layers. Garbage collection doesn't become > more expensive when attaching more than one symbol to each code/data > fragment. Symbol resolution doesn't change when attaching more than one > symbol to each code/data fragment. The list goes on. The single natural > advantage is that you can use a single pointer to the canonical symbol > from a code/data fragment and don't have to use a list/array. Given the > necessary and expensive hacks for splitting sections into (pseudo) > atoms, that doesn't feel like a win. So once again, what actual > advantages for ELF or COFF have been created by the atom model? Mach-O > hardly counts as it doesn't allow the flexibility of the section model > as has been discussed before. > > Joerg > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150511/f1687fb5/attachment.html>