Arseny Kapoulkine via llvm-dev
2016-Jan-22 00:01 UTC
[llvm-dev] lld: ELF/COFF main() interface
> If I were a user, I definitely want the former instead of the latterbecause the former just provides more. This is if you wanted to use the library (e.g. embed linker into clang, do parallel linking of many executables from the same process, etc.). For some use cases there's no difference because the only thing you'll do with a library is link it into a command-line executable and run it.> The current design is not the result of a short-sighted choice but theresult of a deliberate trade-off. I don't really understand where this is coming from, honestly. I understand the frustration of dealing with layers of abstractions that do not fit perfectly within the established framework. I do not understand the resistance to not using global state and propagating errors to the top level. Arseny On Thu, Jan 21, 2016 at 2:54 PM, Rui Ueyama <ruiu at google.com> wrote:> On Thu, Jan 21, 2016 at 2:15 PM, Arseny Kapoulkine < > arseny.kapoulkine at gmail.com> wrote: > >> As a person who started this thread I should probably comment on the >> interface. >> >> My needs only require a library-like version of a command-line interface. >> Just to be specific, the interface that would work okay is the old >> high-level lld interface: >> >> bool link(ArrayRef<const char*> args, raw_ostream& diagnostics) >> >> This would require round-tripping data through files which is not ideal >> but is not too bad. >> >> The ideal version for my cases would be >> >> bool link(ArrayRef<const char*> args, raw_ostream& diagnostics, >> ArrayRef<unique_ptr<MemoryBuffer>> inputs) >> >> This is slightly harder to implement since you have to have a layer of >> argument parsing in the lld command-line driver that separates inputs from >> other args, but it's probably not too bad. >> >> So note that the easiest interface that would satisfy my needs is similar >> to a command-line interface; it should not write errors to stderr and >> should return on errors instead of aborting. These requirements do not seem >> like they would severely complicate the design. They also do not seem >> contentious - there is no risk of overdesigning! I don't see a reason why >> having this interface would be bad. The only "challenge" really is error >> propagation, which is mostly an implementation concern and, as mentioned >> previously, there could be partial solutions where you rely on validity of >> inputs within reasonable boundaries and/or provide separate validation >> functions and keep the core of the linker lean. >> >> Also, not all platforms have forking working, or fast process startup >> times, or any other things that may be used to suggest workarounds for the >> current situations. I can already invoke system linker using these >> facilities - lld is (was) attractive *exactly* because it does not require >> this! (plus it provides more consistent performance, it's way easier to >> debug/enhance/profile within one process etc. etc.) Honestly, old lld is >> very much like LLVM in that I can freely use it in my environment, whereas >> new lld reminds me of my experience integrating mcpp (a C preprocessor) a >> few years ago into an in-engine shader compiler with all the same issues >> (exit() on error, stdout, etc.) that had to be worked around by modifying >> the codebase. >> > > I'm not going to argue that we don't want to support the library use > scenario as we discussed in this thread, but I'd like to point out that it > is natural that all users want to have a linker that is usable both as a > command and as a library, because if you compare a linker/linker-library > with just a linker, the former is precisely a super set of the latter. If I > were a user, I definitely want the former instead of the latter because the > former just provides more. In that sense, only the developers would argue > that there's a good reason to provide less (and that's the case as only > Rafael and I were arguing that way). The current design is not the result > of a short-sighted choice but the result of a deliberate trade-off. I > worked on the old LLD for more than a year, and I made a decision based on > that experience. I'm still thinking that that was a good design choice to > not start writing the new LLD as a library. Again, we are open to future > changes, but it is probably not the right time. > > Arseny >> >> On Thu, Jan 21, 2016 at 11:14 AM, Chandler Carruth <chandlerc at gmail.com> >> wrote: >> >>> On Thu, Jan 21, 2016 at 10:49 AM Rafael EspĂndola < >>> rafael.espindola at gmail.com> wrote: >>> >>>> > There are probably others, but this is the gist of it. Now, you could >>>> still >>>> > design everything with the simplest imaginable API, that is incredibly >>>> > narrow and specialized for a *single* user. But there are still >>>> fundamentals >>>> > of the style of code that are absolutely necessary to build a >>>> library. And >>>> > the only way to make sure we get this right, is to have the single >>>> user of >>>> > the code use it as a library and keep all the business logic inside >>>> the >>>> > library. >>>> > >>>> > This pattern is fundamental to literally every part of LLVM, including >>>> > Clang, LLDB, and thus far LLD. I think it is a core principle of the >>>> project >>>> > as a whole. I think that unless LLD continues to follow this >>>> principle, it >>>> > doesn't really fit in the LLVM project at all. >>>> >>>> The single user so far is the one the people actually coding the >>>> project care for. I seems odd to say that it doesn't fit in the LLVM >>>> project when it has attracted a lot of contributors and hit some >>>> important milestones. >>>> >>> >>> I don't think that every open source effort relating to compilers >>> belongs in the LLVM project. I think that they would have to fit with the >>> overarching goals and design of the LLVM project as a whole. This includes, >>> among other things, being modular and reusable. >>> >>> >>>> >>>> > So, I encourage LLD to keep its interfaces highly specialized for the >>>> users >>>> > it actually has -- and indeed today that may be exactly one user, the >>>> > command line linker. >>>> >>>> We have a highly specialized api consisting of one function: >>>> elf2::link(ArrayRef<const char *> Args). That fits 100% of the uses we >>>> have. If there is ever another use we can evaluate the cost of >>>> supporting it, but first we need to actually write the linker. >>>> >>> >>> Note that I'm perfectly happy with this *interface* today, provided it >>> is genuinely built as a library and can be used in that context. See below. >>> >>> >>>> >>>> Note that this is history replaying itself in a bigger scale. We used >>>> to have a fancy library to handle archives and llvm-ar was written on >>>> top of it. It was the worst ar implementation by far. It had horrible >>>> error handling, incompatible options and produced ar files with >>>> indexes that no linker could use. >>>> >>>> I nuked the library and wrote llvm-ar as the trivial program that it >>>> is. To the best of my knowledge it was then the fastest ar in >>>> existence, actually useful (linkers can use it's .a files) and far >>>> easier to maintain. >>>> >>> >>> The fact that it was in a library, IMO, is completely orthogonal from >>> the fact that the design of that library ended up not working. >>> >>> Bad library code is indeed bad. That doesn't mean that it is terrible >>> hard to write good library code, as you say: >>> >>> >>>> When the effort to support windows came up, there was a need to create >>>> archives from within lld since link.exe can run lib.exe. The >>>> maintainable code was easy to refactor into one library function >>>> llvm::writeArchive. If another use ever show up, we evaluate it. If >>>> not, we keep the very narrow interface. >>>> >>> >>> Yes, +1 to narrow interface, but I think it should always be *in a >>> library*. That impacts much more than just the interface. >>> >>> >>>> >>>> > Finally, I will directly state that we (Google) have a specific >>>> interest in >>>> > both linking LLD libraries into the Clang executable rather than >>>> having >>>> > separate binaries, and in invoking LLD to link many different >>>> executables >>>> > from a single process. So there is at least one concrete user here >>>> today. >>>> > Now, the API we would need for both of these is *exactly* the API >>>> that the >>>> > existing command line linker would need. But the functionality would >>>> have to >>>> > be reasonable to access via a library call. >>>> >>>> Given that clang can fork, I assume that this new clang+lld can fork. >>>> >>> >>> No, it cannot in all cases. We have genuine use cases where forking >>> isn't realistically an option. As an example, imagine that you want to use >>> something like ClangMR (which I presented ages ago) but actually *link* >>> code massively at scale to do post-link analysis of the binaries? There are >>> environments where we need to be able to run the linker on multiple >>> different threads in a single address space and collect the linked object >>> in an in-memory buffer. >>> >>> Also, one of the other possible motivations of using LLD directly from >>> Clang would be to avoid process overhead on operating systems where that is >>> a much more significant part of the compile time cost. We could today >>> actually take the fork out of the Clang driver because the Clang frontend >>> *is* designed in this way. But we would also need LLD to work in this way. >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160121/1c45eb87/attachment.html>
On Thu, Jan 21, 2016 at 4:01 PM, Arseny Kapoulkine < arseny.kapoulkine at gmail.com> wrote:> > If I were a user, I definitely want the former instead of the latter > because the former just provides more. > > This is if you wanted to use the library (e.g. embed linker into clang, do > parallel linking of many executables from the same process, etc.). For some > use cases there's no difference because the only thing you'll do with a > library is link it into a command-line executable and run it. > > The current design is not the result of a short-sighted choice but the > result of a deliberate trade-off. > > I don't really understand where this is coming from, honestly. I > understand the frustration of dealing with layers of abstractions that do > not fit perfectly within the established framework. I do not understand the > resistance to not using global state and propagating errors to the top > level. >There is a trade-off. In order to use main as a library function, all functions behind it have to pass around a context object, and all functions that possibly fail have to be wrapped with ErrorOr. Some functions don't need any context nor they never fail, so maybe we don't have to wrap every function. But if you later find some leaf function needs a context object or could fail, you've got to update all function that can reach there. It's not too hard, but it is all a bit of pain.> Arseny > > On Thu, Jan 21, 2016 at 2:54 PM, Rui Ueyama <ruiu at google.com> wrote: > >> On Thu, Jan 21, 2016 at 2:15 PM, Arseny Kapoulkine < >> arseny.kapoulkine at gmail.com> wrote: >> >>> As a person who started this thread I should probably comment on the >>> interface. >>> >>> My needs only require a library-like version of a command-line >>> interface. Just to be specific, the interface that would work okay is the >>> old high-level lld interface: >>> >>> bool link(ArrayRef<const char*> args, raw_ostream& diagnostics) >>> >>> This would require round-tripping data through files which is not ideal >>> but is not too bad. >>> >>> The ideal version for my cases would be >>> >>> bool link(ArrayRef<const char*> args, raw_ostream& diagnostics, >>> ArrayRef<unique_ptr<MemoryBuffer>> inputs) >>> >>> This is slightly harder to implement since you have to have a layer of >>> argument parsing in the lld command-line driver that separates inputs from >>> other args, but it's probably not too bad. >>> >>> So note that the easiest interface that would satisfy my needs is >>> similar to a command-line interface; it should not write errors to stderr >>> and should return on errors instead of aborting. These requirements do not >>> seem like they would severely complicate the design. They also do not seem >>> contentious - there is no risk of overdesigning! I don't see a reason why >>> having this interface would be bad. The only "challenge" really is error >>> propagation, which is mostly an implementation concern and, as mentioned >>> previously, there could be partial solutions where you rely on validity of >>> inputs within reasonable boundaries and/or provide separate validation >>> functions and keep the core of the linker lean. >>> >>> Also, not all platforms have forking working, or fast process startup >>> times, or any other things that may be used to suggest workarounds for the >>> current situations. I can already invoke system linker using these >>> facilities - lld is (was) attractive *exactly* because it does not require >>> this! (plus it provides more consistent performance, it's way easier to >>> debug/enhance/profile within one process etc. etc.) Honestly, old lld is >>> very much like LLVM in that I can freely use it in my environment, whereas >>> new lld reminds me of my experience integrating mcpp (a C preprocessor) a >>> few years ago into an in-engine shader compiler with all the same issues >>> (exit() on error, stdout, etc.) that had to be worked around by modifying >>> the codebase. >>> >> >> I'm not going to argue that we don't want to support the library use >> scenario as we discussed in this thread, but I'd like to point out that it >> is natural that all users want to have a linker that is usable both as a >> command and as a library, because if you compare a linker/linker-library >> with just a linker, the former is precisely a super set of the latter. If I >> were a user, I definitely want the former instead of the latter because the >> former just provides more. In that sense, only the developers would argue >> that there's a good reason to provide less (and that's the case as only >> Rafael and I were arguing that way). The current design is not the result >> of a short-sighted choice but the result of a deliberate trade-off. I >> worked on the old LLD for more than a year, and I made a decision based on >> that experience. I'm still thinking that that was a good design choice to >> not start writing the new LLD as a library. Again, we are open to future >> changes, but it is probably not the right time. >> >> Arseny >>> >>> On Thu, Jan 21, 2016 at 11:14 AM, Chandler Carruth <chandlerc at gmail.com> >>> wrote: >>> >>>> On Thu, Jan 21, 2016 at 10:49 AM Rafael EspĂndola < >>>> rafael.espindola at gmail.com> wrote: >>>> >>>>> > There are probably others, but this is the gist of it. Now, you >>>>> could still >>>>> > design everything with the simplest imaginable API, that is >>>>> incredibly >>>>> > narrow and specialized for a *single* user. But there are still >>>>> fundamentals >>>>> > of the style of code that are absolutely necessary to build a >>>>> library. And >>>>> > the only way to make sure we get this right, is to have the single >>>>> user of >>>>> > the code use it as a library and keep all the business logic inside >>>>> the >>>>> > library. >>>>> > >>>>> > This pattern is fundamental to literally every part of LLVM, >>>>> including >>>>> > Clang, LLDB, and thus far LLD. I think it is a core principle of the >>>>> project >>>>> > as a whole. I think that unless LLD continues to follow this >>>>> principle, it >>>>> > doesn't really fit in the LLVM project at all. >>>>> >>>>> The single user so far is the one the people actually coding the >>>>> project care for. I seems odd to say that it doesn't fit in the LLVM >>>>> project when it has attracted a lot of contributors and hit some >>>>> important milestones. >>>>> >>>> >>>> I don't think that every open source effort relating to compilers >>>> belongs in the LLVM project. I think that they would have to fit with the >>>> overarching goals and design of the LLVM project as a whole. This includes, >>>> among other things, being modular and reusable. >>>> >>>> >>>>> >>>>> > So, I encourage LLD to keep its interfaces highly specialized for >>>>> the users >>>>> > it actually has -- and indeed today that may be exactly one user, the >>>>> > command line linker. >>>>> >>>>> We have a highly specialized api consisting of one function: >>>>> elf2::link(ArrayRef<const char *> Args). That fits 100% of the uses we >>>>> have. If there is ever another use we can evaluate the cost of >>>>> supporting it, but first we need to actually write the linker. >>>>> >>>> >>>> Note that I'm perfectly happy with this *interface* today, provided it >>>> is genuinely built as a library and can be used in that context. See below. >>>> >>>> >>>>> >>>>> Note that this is history replaying itself in a bigger scale. We used >>>>> to have a fancy library to handle archives and llvm-ar was written on >>>>> top of it. It was the worst ar implementation by far. It had horrible >>>>> error handling, incompatible options and produced ar files with >>>>> indexes that no linker could use. >>>>> >>>>> I nuked the library and wrote llvm-ar as the trivial program that it >>>>> is. To the best of my knowledge it was then the fastest ar in >>>>> existence, actually useful (linkers can use it's .a files) and far >>>>> easier to maintain. >>>>> >>>> >>>> The fact that it was in a library, IMO, is completely orthogonal from >>>> the fact that the design of that library ended up not working. >>>> >>>> Bad library code is indeed bad. That doesn't mean that it is terrible >>>> hard to write good library code, as you say: >>>> >>>> >>>>> When the effort to support windows came up, there was a need to create >>>>> archives from within lld since link.exe can run lib.exe. The >>>>> maintainable code was easy to refactor into one library function >>>>> llvm::writeArchive. If another use ever show up, we evaluate it. If >>>>> not, we keep the very narrow interface. >>>>> >>>> >>>> Yes, +1 to narrow interface, but I think it should always be *in a >>>> library*. That impacts much more than just the interface. >>>> >>>> >>>>> >>>>> > Finally, I will directly state that we (Google) have a specific >>>>> interest in >>>>> > both linking LLD libraries into the Clang executable rather than >>>>> having >>>>> > separate binaries, and in invoking LLD to link many different >>>>> executables >>>>> > from a single process. So there is at least one concrete user here >>>>> today. >>>>> > Now, the API we would need for both of these is *exactly* the API >>>>> that the >>>>> > existing command line linker would need. But the functionality would >>>>> have to >>>>> > be reasonable to access via a library call. >>>>> >>>>> Given that clang can fork, I assume that this new clang+lld can fork. >>>>> >>>> >>>> No, it cannot in all cases. We have genuine use cases where forking >>>> isn't realistically an option. As an example, imagine that you want to use >>>> something like ClangMR (which I presented ages ago) but actually *link* >>>> code massively at scale to do post-link analysis of the binaries? There are >>>> environments where we need to be able to run the linker on multiple >>>> different threads in a single address space and collect the linked object >>>> in an in-memory buffer. >>>> >>>> Also, one of the other possible motivations of using LLD directly from >>>> Clang would be to avoid process overhead on operating systems where that is >>>> a much more significant part of the compile time cost. We could today >>>> actually take the fork out of the Clang driver because the Clang frontend >>>> *is* designed in this way. But we would also need LLD to work in this way. >>>> >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160121/1bc6ab9e/attachment.html>
The context issue may be solved by making all functions and context data members of a class. Sort of having the convenience of global variables accessible from all linker functions but without the regular global variable problems of initializing and re-entry. so the class is suitable aspart of a library. Most clang and LLVM classes works this way, not passing contexts around. 2016-01-22 6:25 GMT+02:00 Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org>:> There is a trade-off. In order to use main as a library function, all > functions behind it have to pass around a context object, and all functions > that possibly fail have to be wrapped with ErrorOr. Some functions don't > need any context nor they never fail, so maybe we don't have to wrap every > function. But if you later find some leaf function needs a context object > or could fail, you've got to update all function that can reach there. It's > not too hard, but it is all a bit of pain. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160126/44a57471/attachment.html>