Hi guys, After working for a long period of time on LLD, I think I found a few things that we should improve in the LLD design for both development ease and runtime performance. I would like to get feedback on this proposal. Thanks! *Problems with the current LLD architecture *The current LLD architecture has, in my opinion, two issues. *The atom model is not the best model for some architectures *The atom model makes sense only for Mach-O, but it’s used everywhere. I guess that we originally expected that we would be able to model the linker’s behavior beautifully using the atom model because the atom model seemed like a superset of the section model. Although it *can*, it turned out that it’s not necessarily natural and efficient model for ELF or PE/COFF on which section-based linking is expected. On ELF or PE/COFF, sections are units of atomic data. We divide a section into smaller “atoms” and then restore the original data layout later to preserve section’s atomicity. That complicates the linker internals. Also it slows down the linker because of the overhead of creating and manipulating atoms. In addition to that, since section-based linking is expected on the architectures, some linker features are defined in terms of sections. An example is “select largest section” in PE/COFF. In the atom model, we don’t have a notion of sections at all, so we had to simulate such features using atoms in tricky ways. *One symbol resolution model doesn’t fit all *The symbol resolution semantics are not the same on three architectures (ELF, Mach-O and PE/COFF), but we only have only one "core" linker for the symbol resolution. The core linker implements the Unix linker semantics; the linker visits a file at a time until all undefined symbols are resolved. For archive files having circular dependencies, you can group them to tell the linker to visit them more than once. This is not the only model to create a linker. It’s not the simplest nor fastest. It’s just that the Unix linker semantics is designed this way, and we all follow for compatibility. For PE/COFF, the linker semantics are different. The order of files in the command line doesn’t matter. The linker scans all files first to create a map from symbols to files, and use the map to resolve all undefined symbols. The PE/COFF semantics are currently simulated using the Unix linker semantics and groups. That made the linker inefficient because of the overhead to visit archive files again and again. Also it made the code bloated and awkward. In short, we generalize too much, and we share code too much. *Proposal* 1. Re-architect the linker based on the section model where it’s appropriate. 2. Stop simulating different linker semantics using the Unix model. Instead, directly implement the native behavior. When it’s done, the atom model will be used only for Mach-O. The other two will be built based on the section model. PE/COFF will have a different "core" linker than Unix’s. I expect this will simplify the design and also improve the linker’s performance (achieving better performance is probably the best way to convince people to try LLD). I don’t think we can gradually move from the atom model to the section model because atoms are everywhere. They are so different that we cannot mix them together at one place. Although we can reuse the design and the outline the existing code, this is going to be more like a major rewriting rather than updating. So I propose developing section-based ports as new "ports" of LLD. I plan to start working on PE/COFF port first because I’m familiar with the code base and the amount of code is less than the ELF port. Also, the fact that the ELF port is developed and maintained by many developers makes porting harder compared to PE/COFF, which is written and maintained only by me. Thus, I’m going to use PE/COFF as an experiment platform to see how it works. Here is a plan. 1. Create a section-based PE/COFF linker backend as a new port 2. If everything is fine, do the same thing for ELF. We may want to move common code for a section-based linker out of the new PE/COFF port to share it with ELF. 3. Move the library for the atom model to the sub-directory for the Mach-O port. The resulting linker will share less code between ports. That’s not necessarily a bad thing -- we actually think it’s a good thing because in order to share code we currently have too many workarounds. This change should fix the balance so that we get (1) shared code that’s naturally able to be shared by multiple ports, and (2) simpler, faster code. *Work Estimation *It’s hard to tell, but I’m probably able to create a PE/COFF linker in a few weeks, which works reasonably well and ready for code review as a first set of patches. I have already built a complete linker for Windows, so the hardest part (understanding it) is already done. Once it’s done, I can get a better estimation for ELF. *Caveat **Why not define a section as an atom and keep using the atom model? *If we do this, we would have to allow atoms to have more than one name. Each name would have an offset in the atom (to represent symbols whose offset from the section start is not zero). But still we need to copy section attributes to each atom. The resulting model no longer looks like the atom model, but a mix of the atom model and the section model, and that comes with the cost of both designs. I think it’s too complicated. *Notes* We want to make sure there’s no existing LLD users who depend on the atom model for ELF, or if there’s such users, we want to come up with a transition path for them. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20150501/4db15a0c/attachment.html>
On Fri, May 1, 2015 at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote:> Caveat Why not define a section as an atom and keep using the atom model? If > we do this, we would have to allow atoms to have more than one name. Each > name would have an offset in the atom (to represent symbols whose offset > from the section start is not zero). But still we need to copy section > attributes to each atom. The resulting model no longer looks like the atom > model, but a mix of the atom model and the section model, and that comes > with the cost of both designs. I think it’s too complicated.Rafael and I have been discussing this change recently. It makes atoms actually atomic, and also splits out symbols, which has been needed. The main reason I like this over each target having its own model is because it gives us a common textual representation to write tests with. As for symbol resolution. It seems the actual problem is name lookup, not the core resolver semantics. I'd rather not end up with basically 3 separate linkers in lld. - Michael Spencer
On Fri, May 1, 2015 at 1:32 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:> On Fri, May 1, 2015 at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote: > > Caveat Why not define a section as an atom and keep using the atom > model? If > > we do this, we would have to allow atoms to have more than one name. Each > > name would have an offset in the atom (to represent symbols whose offset > > from the section start is not zero). But still we need to copy section > > attributes to each atom. The resulting model no longer looks like the > atom > > model, but a mix of the atom model and the section model, and that comes > > with the cost of both designs. I think it’s too complicated. > > Rafael and I have been discussing this change recently. It makes atoms > actually atomic, and also splits out symbols, which has been needed. > The main reason I like this over each target having its own model is > because it gives us a common textual representation to write tests > with. >If you allow multiple symbols in one atom, is the new definition of atom different from section? If so, in what way? As for symbol resolution. It seems the actual problem is name lookup,> not the core resolver semantics. >What's the difference between name lookup and the core resolver semantics?> I'd rather not end up with basically 3 separate linkers in lld. >I basically agree. However, if you take a look at the code of the PE/COFF port, you'll find something weird here and there. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20150501/f7d3aa3e/attachment.html>
I am on the airport waiting to go on vacations, but I must say I am extremely happy to see this happen! I agree with the proposed direction and steps: Implement section based linking for coff. Use that for elf. If it makes sense, use it for macho. On May 1, 2015 3:32 PM, "Rui Ueyama" <ruiu at google.com> wrote:> Hi guys, After working for a long period of time on LLD, I think I found a > few things that we should improve in the LLD design for both development > ease and runtime performance. I would like to get feedback on this > proposal. Thanks! *Problems with the current LLD architecture *The > current LLD architecture has, in my opinion, two issues. > > *The atom model is not the best model for some architectures *The atom > model makes sense only for Mach-O, but it’s used everywhere. I guess that > we originally expected that we would be able to model the linker’s behavior > beautifully using the atom model because the atom model seemed like a > superset of the section model. Although it *can*, it turned out that it’s > not necessarily natural and efficient model for ELF or PE/COFF on which > section-based linking is expected. On ELF or PE/COFF, sections are units of > atomic data. We divide a section into smaller “atoms” and then restore the > original data layout later to preserve section’s atomicity. That > complicates the linker internals. Also it slows down the linker because of > the overhead of creating and manipulating atoms. In addition to that, since > section-based linking is expected on the architectures, some linker > features are defined in terms of sections. An example is “select largest > section” in PE/COFF. In the atom model, we don’t have a notion of sections > at all, so we had to simulate such features using atoms in tricky ways. > > *One symbol resolution model doesn’t fit all *The symbol resolution > semantics are not the same on three architectures (ELF, Mach-O and > PE/COFF), but we only have only one "core" linker for the symbol > resolution. The core linker implements the Unix linker semantics; the > linker visits a file at a time until all undefined symbols are resolved. > For archive files having circular dependencies, you can group them to tell > the linker to visit them more than once. This is not the only model to > create a linker. It’s not the simplest nor fastest. It’s just that the Unix > linker semantics is designed this way, and we all follow for compatibility. > For PE/COFF, the linker semantics are different. The order of files in the > command line doesn’t matter. The linker scans all files first to create a > map from symbols to files, and use the map to resolve all undefined > symbols. The PE/COFF semantics are currently simulated using the Unix > linker semantics and groups. That made the linker inefficient because of > the overhead to visit archive files again and again. Also it made the code > bloated and awkward. In short, we generalize too much, and we share code > too much. > > *Proposal* > > 1. Re-architect the linker based on the section model where it’s > appropriate. > 2. Stop simulating different linker semantics using the Unix model. > Instead, directly implement the native behavior. > > When it’s done, the atom model will be used only for Mach-O. The other two > will be built based on the section model. PE/COFF will have a different > "core" linker than Unix’s. I expect this will simplify the design and also > improve the linker’s performance (achieving better performance is probably > the best way to convince people to try LLD). I don’t think we can gradually > move from the atom model to the section model because atoms are everywhere. > They are so different that we cannot mix them together at one place. > Although we can reuse the design and the outline the existing code, this is > going to be more like a major rewriting rather than updating. So I propose > developing section-based ports as new "ports" of LLD. I plan to start > working on PE/COFF port first because I’m familiar with the code base and > the amount of code is less than the ELF port. Also, the fact that the ELF > port is developed and maintained by many developers makes porting harder > compared to PE/COFF, which is written and maintained only by me. Thus, I’m > going to use PE/COFF as an experiment platform to see how it works. Here is > a plan. > > 1. Create a section-based PE/COFF linker backend as a new port > 2. If everything is fine, do the same thing for ELF. We may want to > move common code for a section-based linker out of the new PE/COFF port to > share it with ELF. > 3. Move the library for the atom model to the sub-directory for the > Mach-O port. > > The resulting linker will share less code between ports. That’s not > necessarily a bad thing -- we actually think it’s a good thing because in > order to share code we currently have too many workarounds. This change > should fix the balance so that we get (1) shared code that’s naturally able > to be shared by multiple ports, and (2) simpler, faster code. > *Work Estimation *It’s hard to tell, but I’m probably able to create a > PE/COFF linker in a few weeks, which works reasonably well and ready for > code review as a first set of patches. I have already built a complete > linker for Windows, so the hardest part (understanding it) is already done. > Once it’s done, I can get a better estimation for ELF. > *Caveat **Why not define a section as an atom and keep using the atom > model? *If we do this, we would have to allow atoms to have more than one > name. Each name would have an offset in the atom (to represent symbols > whose offset from the section start is not zero). But still we need to copy > section attributes to each atom. The resulting model no longer looks like > the atom model, but a mix of the atom model and the section model, and that > comes with the cost of both designs. I think it’s too complicated. > > *Notes* > We want to make sure there’s no existing LLD users who depend on the atom > model for ELF, or if there’s such users, we want to come up with a > transition path for them. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20150501/5bfb858d/attachment.html>
On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote:> The atom model is not the best model for some architecturesThe atom model is a good fit for the llvm compiler model for all architectures. There is a one-to-one mapping between llvm::GlobalObject (e.g. function or global variable) and lld:DefinedAtom. The problem is the ELF/PECOFF file format. (Actually mach-o is also section based, but we have refrained from adding complex section-centric features to it, so mapping it to atoms is not too hard). I’d rather see our effort put to moving ahead to an llvm based object file format (aka “native” format) which bypasses the impedance mismatch of going through ELF/COFF.> One symbol resolution model doesn’t fit allYes, the Resolver was meant to call out to the LinkingContext object to direct it on how to link. Somehow that got morphed into “there should be a universal data model that when the Resolver process the input data, the right platform specific linking behavior falls out”. -Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20150501/bb72edf5/attachment.html>
On May 1, 2015 9:42 PM, "Nick Kledzik" <kledzik at apple.com> wrote:> > > On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote: >> >> The atom model is not the best model for some architectures > > > The atom model is a good fit for the llvm compiler model for allarchitectures. There is a one-to-one mapping between llvm::GlobalObject (e.g. function or global variable) and lld:DefinedAtom. That is not the input to the linker and therefore irrelevant.> The problem is the ELF/PECOFF file format. (Actually mach-o is alsosection based, but we have refrained from adding complex section-centric features to it, so mapping it to atoms is not too hard). The objective is to build an elf and coff linker. The input has sections and splitting them is a total waste of time and extra design complexity.> I’d rather see our effort put to moving ahead to an llvm based objectfile format (aka “native” format) which bypasses the impedance mismatch of going through ELF/COFF. Absolutely not. We have to be able to handle elf and coff and do it well. Also, gold shows that elf at least works extremely well. With function sections the compiler is in complete control of the size of the units the linker uses. With my recent work on MC the representation is also very efficient. I have no reason to believe coff is any different. Cheers, Rafael -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20150501/bb72314b/attachment.html>
On Fri, May 1, 2015 at 6:46 PM Nick Kledzik <kledzik at apple.com> wrote:> > On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote: > > *The atom model is not the best model for some architectures * > > > The atom model is a good fit for the llvm compiler model for all > architectures. There is a one-to-one mapping between llvm::GlobalObject > (e.g. function or global variable) and lld:DefinedAtom. >I'm not sure how that's really relevant. On some architectures, the unit at which linking is defined to occur isn't a global object. A classic example of this are architectures that have a hard semantic reliance grouping two symbols together and linking either both or neither of them.> The problem is the ELF/PECOFF file format. (Actually mach-o is also > section based, but we have refrained from adding complex section-centric > features to it, so mapping it to atoms is not too hard). > > I’d rather see our effort put to moving ahead to an llvm based object file > format (aka “native” format) which bypasses the impedance mismatch of going > through ELF/COFF. >We still have to be able to (efficiently) link existing ELF and COFF objects though? While I'm actually pretty interested in some better object file format, I also want a better linker for the world we live in today...> > > > > *One symbol resolution model doesn’t fit all* > > > Yes, the Resolver was meant to call out to the LinkingContext object to > direct it on how to link. Somehow that got morphed into “there should be a > universal data model that when the Resolver process the input data, the > right platform specific linking behavior falls out”. > > > -Nick > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20150502/49e18575/attachment.html>
On 1 May 2015 at 20:31, Rui Ueyama <ruiu at google.com> wrote:> > One symbol resolution model doesn’t fit all The symbol resolution semantics > are not the same on three architectures (ELF, Mach-O and PE/COFF), but we > only have only one "core" linker for the symbol resolution. The core linker > implements the Unix linker semantics; the linker visits a file at a time > until all undefined symbols are resolved. For archive files having circular > dependencies, you can group them to tell the linker to visit them more than > once. This is not the only model to create a linker. It’s not the simplest > nor fastest. It’s just that the Unix linker semantics is designed this way, > and we all follow for compatibility. For PE/COFF, the linker semantics are > different. The order of files in the command line doesn’t matter. The linker > scans all files first to create a map from symbols to files, and use the map > to resolve all undefined symbols. The PE/COFF semantics are currently > simulated using the Unix linker semantics and groups. That made the linker > inefficient because of the overhead to visit archive files again and again. > Also it made the code bloated and awkward. In short, we generalize too much, > and we share code too much. >Why can't LLD be free to implement a resolving algorithm that performs better. The PE/COFF method you describe seems more efficient that the existing ELF method. What is stopping LLD from using the PE/COFF method for ELF. It could also do further optimizations such as caching the resolved symbols. To me,the existing algorithms read as ELF == Full table scan, PE/COEF == Indexed. Also, could some of the symbol resolution be done at compile time? E.g. If I include stdio.h, I know which link time library that is associated with, so I can resolve those symbols at compile time. Maybe we could store that information in the pre-compiled headers file format, and subsequently in the .o files. This would then leave far fewer symbols to resolve at link time. Kind Regards James
On Mon, May 04, 2015 at 09:29:16AM +0100, James Courtier-Dutton wrote:> Also, could some of the symbol resolution be done at compile time? > E.g. If I include stdio.h, I know which link time library that is > associated with, so I can resolve those symbols at compile time.Where would you get that information from? No such tagging exists in standard C or even the extended dialect of C clang is implementing. Joerg
Most of what I wanted to say has been said, but I wanted to explicitly call out COMDAT groups as something that we want that doesn't fit the atom model very well. Adding first class COMDATs was necessary for implementing large parts of the Microsoft C++ ABI, but it also turns out that it's really handy on other platforms. We've made a number of changes to Clang's IRgen to do things like eliminate duplicate dynamic initialization for static data members of class templates and share code for complete, base, and deleting destructors. Basically, COMDAT groups are a tool that the compiler can use to change the way things are linked without changing the linker. They allow the compiler to add new functionality and reduce coupling between the compiler and the linker. This is a real tradeoff worth thinking about. I think for many platforms (Windows, Linux) Clang is not the system compiler and we need to support efficiently linking against existing libraries for a long time to come. There are other platforms (Mac, PS4) with a single toolchain where controlling the linker allows adding new functionality quickly. I think Alex is right, we should probably meet some time and figure out what people need and how to support both kinds of platform well. Reid On Fri, May 1, 2015 at 6:42 PM, Nick Kledzik <kledzik at apple.com> wrote:> > On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote: > > *The atom model is not the best model for some architectures * > > > The atom model is a good fit for the llvm compiler model for all > architectures. There is a one-to-one mapping between llvm::GlobalObject > (e.g. function or global variable) and lld:DefinedAtom. > > The problem is the ELF/PECOFF file format. (Actually mach-o is also > section based, but we have refrained from adding complex section-centric > features to it, so mapping it to atoms is not too hard). > > I’d rather see our effort put to moving ahead to an llvm based object file > format (aka “native” format) which bypasses the impedance mismatch of going > through ELF/COFF. > > > > *One symbol resolution model doesn’t fit all* > > > Yes, the Resolver was meant to call out to the LinkingContext object to > direct it on how to link. Somehow that got morphed into “there should be a > universal data model that when the Resolver process the input data, the > right platform specific linking behavior falls out”. > > > -Nick > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20150504/78d8a587/attachment.html>
On Mon, May 4, 2015 at 1:29 AM, James Courtier-Dutton < james.dutton at gmail.com> wrote:> On 1 May 2015 at 20:31, Rui Ueyama <ruiu at google.com> wrote: > > > > One symbol resolution model doesn’t fit all The symbol resolution > semantics > > are not the same on three architectures (ELF, Mach-O and PE/COFF), but we > > only have only one "core" linker for the symbol resolution. The core > linker > > implements the Unix linker semantics; the linker visits a file at a time > > until all undefined symbols are resolved. For archive files having > circular > > dependencies, you can group them to tell the linker to visit them more > than > > once. This is not the only model to create a linker. It’s not the > simplest > > nor fastest. It’s just that the Unix linker semantics is designed this > way, > > and we all follow for compatibility. For PE/COFF, the linker semantics > are > > different. The order of files in the command line doesn’t matter. The > linker > > scans all files first to create a map from symbols to files, and use the > map > > to resolve all undefined symbols. The PE/COFF semantics are currently > > simulated using the Unix linker semantics and groups. That made the > linker > > inefficient because of the overhead to visit archive files again and > again. > > Also it made the code bloated and awkward. In short, we generalize too > much, > > and we share code too much. > > > > Why can't LLD be free to implement a resolving algorithm that performs > better. > The PE/COFF method you describe seems more efficient that the existing > ELF method. > What is stopping LLD from using the PE/COFF method for ELF. It could > also do further optimizations such as caching the resolved symbols. > To me,the existing algorithms read as ELF == Full table scan, PE/COEF > == Indexed. >The two semantics are not compatible. The results of the two are not always the same. For example, this is why we have to pass -lc after object files instead of the beginning of the command line. "ln -lc foo.o" would just skip libc because when it visits the library, there's no undefined symbols to be resolved. foo.o would then add undefined symbols that could have been resolved using libc, but it's too late. The link would fail. This is how linkers on Unix works. There are other differences resulting from the difference, so we cannot change that unless we break the compatibility. Also, could some of the symbol resolution be done at compile time?> E.g. If I include stdio.h, I know which link time library that is > associated with, so I can resolve those symbols at compile time. > Maybe we could store that information in the pre-compiled headers file > format, and subsequently in the .o files. > This would then leave far fewer symbols to resolve at link time. >You can link against an alternative libc, for example, so that's not usually doable. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20150504/319aafae/attachment.html>
On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote:> Proposal > Re-architect the linker based on the section model where it’s appropriate. > Stop simulating different linker semantics using the Unix model. Instead, directly implement the native behavior.Preface: I have never personally contributed code to LLD, so don’t take anything I’m about to say too seriously. This is not a mandate or anything, just an observation/idea. I think that there is an alternative solution to these exact same problems. What you’ve identified here is that there are two camps of people working on LLD, and they have conflicting goals: - Camp A: LLD is infrastructure for the next generation of awesome linking and toolchain features, it should take advantage of how compilers work to offer new features, performance, etc without deep concern for compatibility. - Camp B: LLD is a drop in replacement system linker (notably for COFF and ELF systems), which is best of breed and with no compromises w.r.t. that goal. I think the problem here is that these lead to natural and inescapable tensions, and Alex summarized how Camp B has been steering LLD away from what Camp A people want. This isn’t bad in and of itself, because what Camp B wants is clearly and unarguably good for LLVM. However, it is also not sufficient, and while innovation in the linker space (e.g. a new “native” object file format generated directly from compiler structures) may or may not actually “work” or be “worth it”, we won’t know unless we try, and that won’t fulfill its promise if there are compromises to Camp B. So here’s my counterproposal: two different linkers. Lets stop thinking about lld as one linker, and instead think of it is two different ones. We’ll build a Camp B linker which is the best of breed section based linker. It will support linker scripts and do everything better than any existing section based linker. The first step of this is to do what Rui proposes and rip atoms out of the model. We will also build a no-holds-barred awesome atom based linker that takes advantage of everything it can from LLVM’s architecture to enable innovative new tools without worrying too much about backwards compatibility. These two linkers should share whatever code makes sense, but also shouldn’t try to share code that doesn’t make sense. The split between the semantic model of sections vs atoms seems like a very natural one to me. One question is: does it make sense for these to live in the same lld subproject, or be split into two different subprojects? I think the answer to that question is driven from whether there is shared code common between the two linkers that doesn’t make sense to sink down to the llvm subproject itself. What do you think? -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20150504/0616b551/attachment.html>
On Mon, May 04, 2015 at 12:52:55PM -0700, Chris Lattner wrote:> I think the problem here is that these lead to natural and inescapable > tensions, and Alex summarized how Camp B has been steering LLD away > from what Camp A people want. This isn’t bad in and of itself, because > what Camp B wants is clearly and unarguably good for LLVM. However, > it is also not sufficient, and while innovation in the linker space > (e.g. a new “native” object file format generated directly from > compiler structures) may or may not actually “work” or be “worth it”, > we won’t know unless we try, and that won’t fulfill its promise if > there are compromises to Camp B.It has been said in this thread before, but I fail to see how the atom model is an actual improvement over the fine grained section model. It seems to be artifically restricted for no good reasons.> Lets stop thinking about lld as one linker, and instead think of it is > two different ones. We’ll build a Camp B linker which is the best of > breed section based linker. It will support linker scripts and do > everything better than any existing section based linker. The first > step of this is to do what Rui proposes and rip atoms out of the model.This is another item that has been irritating me. While it is a very laudable goal to not depend on linker scripts for the common case, not having the functionality of fine grained output control is certainly a problem. They are crucial for embedded developers and also at least significant for anything near a system kernel.> We will also build a no-holds-barred awesome atom based linker that > takes advantage of everything it can from LLVM’s architecture to enable > innovative new tools without worrying too much about backwards > compatibility.I'd say that a good justificatiton for way an atom based linker is/can be better would be a good start... Joerg
I'm sorry if my suggestion gave an impression that I disregard the Mach-O port of the LLD linker. I do care about Mach-O. I do not plan to break or remove any functionality from the current Mach-O port of the LLD. I don't propose to remove the atom model from the linker as long as it seems to be a good fit for the port (and looks like it is). As to the proposal to have two different linkers, I'd think that that's not really a counter-proposal, as it's similar to what I'm proposing. Maybe the view of "future file formats vs the existing formats" (or "experimental platform vs. practical tool") is not right to get the difference between the atom model and the section model, since the Mach-O file an existing file format which we'd want to keep to be on the atom model. I think we want both even for the existing formats. My proposal can be read as suggesting we split the LLD linker into two major parts, the atom model-based and the section model-based, while keeping the two under the same project and repository. I still think that we can share code between the two, especially for the LTO, which is I prefer to have the two under the same repository. On Mon, May 4, 2015 at 12:52 PM, Chris Lattner <clattner at apple.com> wrote:> On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote: > > *Proposal* > > 1. Re-architect the linker based on the section model where it’s > appropriate. > 2. Stop simulating different linker semantics using the Unix model. > Instead, directly implement the native behavior. > > Preface: I have never personally contributed code to LLD, so don’t take > anything I’m about to say too seriously. This is not a mandate or > anything, just an observation/idea. > > > I think that there is an alternative solution to these exact same > problems. What you’ve identified here is that there are two camps of > people working on LLD, and they have conflicting goals: > > - Camp A: LLD is infrastructure for the next generation of awesome linking > and toolchain features, it should take advantage of how compilers work to > offer new features, performance, etc without deep concern for compatibility. > > - Camp B: LLD is a drop in replacement system linker (notably for COFF and > ELF systems), which is best of breed and with no compromises w.r.t. that > goal. > > > I think the problem here is that these lead to natural and inescapable > tensions, and Alex summarized how Camp B has been steering LLD away from > what Camp A people want. This isn’t bad in and of itself, because what > Camp B wants is clearly and unarguably good for LLVM. However, it is also > not sufficient, and while innovation in the linker space (e.g. a new > “native” object file format generated directly from compiler structures) > may or may not actually “work” or be “worth it”, we won’t know unless we > try, and that won’t fulfill its promise if there are compromises to Camp B. > > So here’s my counterproposal: *two different linkers.* > > Lets stop thinking about lld as one linker, and instead think of it is two > different ones. We’ll build a Camp B linker which is the best of breed > section based linker. It will support linker scripts and do everything > better than any existing section based linker. The first step of this is > to do what Rui proposes and rip atoms out of the model. > > We will *also* build a no-holds-barred awesome atom based linker that > takes advantage of everything it can from LLVM’s architecture to enable > innovative new tools without worrying too much about backwards > compatibility. > > These two linkers should share whatever code makes sense, but also > shouldn’t try to share code that doesn’t make sense. The split between the > semantic model of sections vs atoms seems like a very natural one to me. > > One question is: does it make sense for these to live in the same lld > subproject, or be split into two different subprojects? I think the answer > to that question is driven from whether there is shared code common between > the two linkers that doesn’t make sense to sink down to the llvm subproject > itself. > > What do you think? > > -Chris > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20150506/3afd2e0b/attachment.html>
Hi, There are a lot of advantages to keep on improving the atom model and working on that model. The atom model allowed lld to have a single intermediate representation for all the formats ELF/COFF/Mach-O. The native model allowed the intermediate representation to be serialized to disk too. If the intermediate representations data structures are made available to scripting languages most of all linker script script layout can be implemented by the end user. A new language also can be developed as most of the users need it and it can work on this intermediate representation. The atom model also simplified a lot of usecases like garbage collection and having the resolve to deal just with atoms. The section model would sound simple from the outside but it it has its own challenges like separating the symbol information from section information. The atom model also simplifies testing as there is one unique/nice way to test the core linker independent of the format. In addition to testing, there are tools that are designed to convert ELF to COFF or viceversa, which makes lld to support these usecases by design. Most of all embedded users want to reduce the final image size by compiling code using -ffunction-sections and -fdata-sections, which makes the atom model directly model it. Thanks to Espindola for adding support for -fno-unique-section-names which makes lld and the atom model more useful. lld has already proven that it can link most of our llvm tools and self host with reasonable performance, I dont see why we dont want to continue with the Atom model. Atom model also eases up dealing with LTO in general. In summary, I would like to continue the ELF ports using the Atom model. _If a section model is being chosen to model flavors lets not mixing it up with Atom model as I can see there would be very less code sharing._ Shankar Easwaran On 5/4/2015 2:52 PM, Chris Lattner wrote:> On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote: >> Proposal >> Re-architect the linker based on the section model where it’s appropriate. >> Stop simulating different linker semantics using the Unix model. Instead, directly implement the native behavior. > Preface: I have never personally contributed code to LLD, so don’t take anything I’m about to say too seriously. This is not a mandate or anything, just an observation/idea. > > > I think that there is an alternative solution to these exact same problems. What you’ve identified here is that there are two camps of people working on LLD, and they have conflicting goals: > > - Camp A: LLD is infrastructure for the next generation of awesome linking and toolchain features, it should take advantage of how compilers work to offer new features, performance, etc without deep concern for compatibility. > > - Camp B: LLD is a drop in replacement system linker (notably for COFF and ELF systems), which is best of breed and with no compromises w.r.t. that goal. > > > I think the problem here is that these lead to natural and inescapable tensions, and Alex summarized how Camp B has been steering LLD away from what Camp A people want. This isn’t bad in and of itself, because what Camp B wants is clearly and unarguably good for LLVM. However, it is also not sufficient, and while innovation in the linker space (e.g. a new “native” object file format generated directly from compiler structures) may or may not actually “work” or be “worth it”, we won’t know unless we try, and that won’t fulfill its promise if there are compromises to Camp B. > > So here’s my counterproposal: two different linkers. > > Lets stop thinking about lld as one linker, and instead think of it is two different ones. We’ll build a Camp B linker which is the best of breed section based linker. It will support linker scripts and do everything better than any existing section based linker. The first step of this is to do what Rui proposes and rip atoms out of the model. > > We will also build a no-holds-barred awesome atom based linker that takes advantage of everything it can from LLVM’s architecture to enable innovative new tools without worrying too much about backwards compatibility. > > These two linkers should share whatever code makes sense, but also shouldn’t try to share code that doesn’t make sense. The split between the semantic model of sections vs atoms seems like a very natural one to me. > > One question is: does it make sense for these to live in the same lld subproject, or be split into two different subprojects? I think the answer to that question is driven from whether there is shared code common between the two linkers that doesn’t make sense to sink down to the llvm subproject itself. > > What do you think? > > -Chris > >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20150506/519a74a2/attachment.html>
On Mon, May 4, 2015 at 12:52 PM, Chris Lattner <clattner at apple.com> wrote:> On May 1, 2015, at 12:31 PM, Rui Ueyama <ruiu at google.com> wrote: > > *Proposal* > > 1. Re-architect the linker based on the section model where it’s > appropriate. > 2. Stop simulating different linker semantics using the Unix model. > Instead, directly implement the native behavior. > > Preface: I have never personally contributed code to LLD, so don’t take > anything I’m about to say too seriously. This is not a mandate or > anything, just an observation/idea. > > > I think that there is an alternative solution to these exact same > problems. What you’ve identified here is that there are two camps of > people working on LLD, and they have conflicting goals: > > - Camp A: LLD is infrastructure for the next generation of awesome linking > and toolchain features, it should take advantage of how compilers work to > offer new features, performance, etc without deep concern for compatibility. > > - Camp B: LLD is a drop in replacement system linker (notably for COFF and > ELF systems), which is best of breed and with no compromises w.r.t. that > goal. > > > I think the problem here is that these lead to natural and inescapable > tensions, and Alex summarized how Camp B has been steering LLD away from > what Camp A people want. >I don't think this is correct. The only reason we should be having a major split along Camp A and Camp B like you describe is in the face of a concrete "compatibility" feature that *absolutely cannot* be implemented without sacrificing a modular, library-based, well-architected design. That is not what this thread is about, so I don't think your split is accurate. I think it has merely been *forgotten* that it would be useful to have an infrastructure rather than "main() in a library", or even "main() in a separate binary"!. This is the LLVM ethos (as I'm sure you know far better than I ;). Both LLVM and Clang prove that hard problems with significant "compatibility" concerns can be tackled and top-tier in-production QoI achieved while still upholding a high standard of external reusability that allows novel uses. Given what LLVM and Clang have been able to absorb (x86 ISA? C++? MSVC inline asm?) without fundamentally destabilizing their modular design, I think that we can do a bit better with LLD before throwing in the towel and cleaving it into a "compatibility" and "shiny" part. -- Sean Silva> This isn’t bad in and of itself, because what Camp B wants is clearly and > unarguably good for LLVM. However, it is also not sufficient, and while > innovation in the linker space (e.g. a new “native” object file format > generated directly from compiler structures) may or may not actually “work” > or be “worth it”, we won’t know unless we try, and that won’t fulfill its > promise if there are compromises to Camp B. > > So here’s my counterproposal: *two different linkers.* > > Lets stop thinking about lld as one linker, and instead think of it is two > different ones. We’ll build a Camp B linker which is the best of breed > section based linker. It will support linker scripts and do everything > better than any existing section based linker. The first step of this is > to do what Rui proposes and rip atoms out of the model. > > We will *also* build a no-holds-barred awesome atom based linker that > takes advantage of everything it can from LLVM’s architecture to enable > innovative new tools without worrying too much about backwards > compatibility. > > These two linkers should share whatever code makes sense, but also > shouldn’t try to share code that doesn’t make sense. The split between the > semantic model of sections vs atoms seems like a very natural one to me. > > One question is: does it make sense for these to live in the same lld > subproject, or be split into two different subprojects? I think the answer > to that question is driven from whether there is shared code common between > the two linkers that doesn’t make sense to sink down to the llvm subproject > itself. > > What do you think? > > -Chris > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20150528/be1c43e0/attachment.html>