> So the situation for LLD being used as a library is analogous to LLVM IRlibraries being used from clang. In general, clang knows that it just generated the IR and that the IR is correct (otherwise it would be a bug in clang), and thus it disables the verifier. That sounds good, but we can only usually trust our inputs. There are still some awfully crusty object files out in the world, so we need to verify any objects coming in from disk, and at least as it applies to libObject that doesn't fit the current lazy error handling scheme. FWIW I believe Kevin Enderby has used a similar up-front object verification scheme before in CCTools, and we may end up implementing something like that again if we do a custom MachO class (and relegate MachOObjectFile to a view). - Lang. On Wed, Jan 13, 2016 at 4:41 PM, Rui Ueyama via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Having perfectly consistent object files does not necessarily mean that > the linker is always able to link them because of various higher layer > errors such as failing to resolve symbols. But yes, if you consider a > compiler and the linker as one system, we can handle corrupted file as an > internal error that should never happen, and I think most of error() calls > are for such errors. > > On Wed, Jan 13, 2016 at 4:29 PM, Sean Silva <chisophugis at gmail.com> wrote: > >> >> >> On Thu, Jan 7, 2016 at 6:07 PM, Rui Ueyama via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> On Thu, Jan 7, 2016 at 5:12 PM, Chandler Carruth <chandlerc at gmail.com> >>> wrote: >>> >>>> On Thu, Jan 7, 2016 at 4:05 PM Rui Ueyama <ruiu at google.com> wrote: >>>> >>>>> By organizing it as a library, I'm expecting something coarse. I don't >>>>> expect to reorganize the linker itself as a collection of small libraries, >>>>> but make the entire linker available as a library, so that you can link >>>>> stuff in-process. More specifically, I expect that the library would >>>>> basically export one function, link(std::vector<StringRef>), which takes >>>>> command line arguments, and returns a memory buffer for a newly created >>>>> executable. We may want to allow a mix of StringRef and MemoryBuffer as >>>>> input, so that you can directly pass in-memory objects to the linker, but >>>>> the basic idea remains the same. >>>>> >>>>> Are we on the same page? >>>>> >>>> >>>> Let me answer this below, where I think you get to the core of the >>>> problem. >>>> >>>> >>>>> >>>>> On Thu, Jan 7, 2016 at 3:44 PM, Chandler Carruth <chandlerc at gmail.com> >>>>> wrote: >>>>> >>>>>> On Thu, Jan 7, 2016 at 3:18 PM Rui Ueyama <ruiu at google.com> wrote: >>>>>> >>>>>>> On Thu, Jan 7, 2016 at 2:56 PM, Chandler Carruth < >>>>>>> chandlerc at gmail.com> wrote: >>>>>>> >>>>>>>> On Thu, Jan 7, 2016 at 7:18 AM Rui Ueyama via llvm-dev < >>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>> >>>>>>>>> On Thu, Jan 7, 2016 at 7:03 AM, Arseny Kapoulkine via llvm-dev < >>>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>>> >>>>>>>>>> In the process of migrating from old lld ELF linker to new >>>>>>>>>> (previously ELF2) I noticed the interface lost several important features >>>>>>>>>> (ordered by importance for my use case): >>>>>>>>>> >>>>>>>>>> 1. Detecting errors in the first place. New linker seems to call >>>>>>>>>> exit(1) for any error. >>>>>>>>>> >>>>>>>>>> 2. Reporting messages to non-stderr outputs. Previously all link >>>>>>>>>> functions had a raw_ostream argument so it was possible to delay the error >>>>>>>>>> output, aggregate it for multiple linked files, output via a different >>>>>>>>>> format, etc. >>>>>>>>>> >>>>>>>>>> 3. Linking multiple outputs in parallel (useful for test drivers) >>>>>>>>>> in a single process. Not really an interface issue but there are at least >>>>>>>>>> two global pointers (Config & Driver) that refer to stack variables and are >>>>>>>>>> used in various places in the code. >>>>>>>>>> >>>>>>>>>> All of this seems to indicate a departure from the linker being >>>>>>>>>> useable as a library. To maintain the previous behavior you'd have to use a >>>>>>>>>> linker binary & popen. >>>>>>>>>> >>>>>>>>>> Is this a conscious design decision or a temporary limitation? >>>>>>>>>> >>>>>>>>> >>>>>>>>> That the new ELF and COFF linkers are designed as commands instead >>>>>>>>> of libraries is very much an intended design change. >>>>>>>>> >>>>>>>> >>>>>>>> I disagree. >>>>>>>> >>>>>>>> During the discussion, there was a *specific* discussion of both >>>>>>>> the new COFF port and ELF port continuing to be libraries with a common >>>>>>>> command line driver. >>>>>>>> >>>>>>> >>>>>>> There was a discussion that we would keep the same entry point for >>>>>>> the old and the new, but I don't remember if I promised that we were going >>>>>>> to organize the new linker as a library. >>>>>>> >>>>>> >>>>>> Ok, myself and essentially everyone else thought this was clear. If >>>>>> it isn't lets clarify: >>>>>> >>>>>> I think it is absolutely critical and important that LLD's >>>>>> architecture remain one where all functionality is available as a library. >>>>>> This is *the* design goal of LLVM and all of LLVM's infrastructure. This >>>>>> applies just as much to LLD as it does to Clang. >>>>>> >>>>>> You say that it isn't compelling to match Clang's design, but in fact >>>>>> it is. You would need an overwhelming argument to *diverge* from Clang's >>>>>> design. >>>>>> >>>>>> The fact that it makes the design more challenging is not compelling >>>>>> at all. Yes, building libraries that can be re-used and making the binary >>>>>> calling it equally efficient is more challenging, but that is the express >>>>>> mission of LLVM and every project within it. >>>>>> >>>>>> >>>>>>> The new one is designed as a command from day one. (Precisely >>>>>>> speaking, the original code propagates errors all the way up to the entry >>>>>>> point, so you can call it and expect it to always return. Rafael introduced >>>>>>> error() function later and we now depends on that function does not return.) >>>>>>> >>>>>> >>>>>> I think this last was a mistake. >>>>>> >>>>>> The fact that the code propagates errors all the way up is fine, and >>>>>> even good. We don't necessarily need to be able to *recover* from link >>>>>> errors and try some other path. >>>>>> >>>>>> But we absolutely need the design to be a *library* that can be >>>>>> embedded into other programs and tools. I can't even begin to count the use >>>>>> cases for this. >>>>>> >>>>>> So please, let's go back to where we *do not* rely on never-returning >>>>>> error handling. That is an absolute mistake. >>>>>> >>>>>> >>>>>>> >>>>>>> If you want to consider changing that, we should have a fresh (and >>>>>>>> broad) discussion, but it goes pretty firmly against the design of the >>>>>>>> entire LLVM project. I also don't really understand why it would be >>>>>>>> beneficial. >>>>>>>> >>>>>>> >>>>>>> I'm not against organizing it as a library as long as it does not >>>>>>> make things too complicated >>>>>>> >>>>>> >>>>>> I am certain that it will make things more complicated, but that is >>>>>> the technical challenge that we must overcome. It will be hard, but I am >>>>>> absolutely confident it is possible to have an elegant library design here. >>>>>> It may not be as simple as a pure command line tool, but it will be >>>>>> *dramatically* more powerful, general, and broadly applicable. >>>>>> >>>>>> The design of LLVM is not the simplest way to build a compiler. But >>>>>> it is valuable to all of those working on it precisely because of this >>>>>> flexibility imparted by its library oriented design. This is absolutely not >>>>>> something that we should lose from the linker. >>>>>> >>>>>> >>>>>>> , and I guess reorganizing the existing code as a library is >>>>>>> relatively easy because it's still pretty small, but I don't really want to >>>>>>> focus on that until it becomes usable as an alternative to GNU ld or gold. >>>>>>> I want to focus on the linker features themselves at this moment. Once it's >>>>>>> complete, it becomes more clear how to organize it. >>>>>>> >>>>>> >>>>>> Ok, now we're talking about something totally reasonable. >>>>>> >>>>>> If it is easier for you all to develop this first as a command line >>>>>> tool, and then make it work as a library, sure, go for it. You're doing the >>>>>> work, I can hardly tell you how to go about it. ;] >>>>>> >>>>> >>>>> It is not only easier for me to develop but is also super important >>>>> for avoiding over-designing the API of the library. Until we know what we >>>>> need to do and what can be done, it is too easy to make mistake to design >>>>> API that is supposed to cover everything -- including hypothetical >>>>> unrealistic ones. Such API would slow down the development speed >>>>> significantly, and it's a pain when we abandon that when we realize that >>>>> that was not needed. >>>>> >>>> >>>> I'm very sympathetic to the problem of not wanting to design an API >>>> until the concrete use cases for it appear. That makes perfect sense. >>>> >>>> We just need to be *ready* to extend the library API (and potentially >>>> find a more fine grained layering if one is actually called for) when a >>>> reasonable and real use case arises for some users of LLD. Once we have >>>> people that actually have a use case and want to introduce a certain >>>> interface to the library that supports it, we need to work with them to >>>> figure out how to effectively support their use case. >>>> >>>> At the least, we clearly need the super simple interface[1] that the >>>> command line tool would use, but an in-process linker could also probably >>>> use. >>>> >>> >>> Okay. I understood that fairly large number of people want to use the >>> linker without starting a new process even if it just provides super simple >>> interface which is essentially equivalent to command line options. That can >>> be done by removing a few global variables and sprinkle ErrorOr<> in many >>> places, so that you can call the linker's main() function from your >>> program. That's bothersome but should not be that painful. I put it on my >>> todo list. It's not at the top of the list, but I recognize the need and >>> will do at some point. Current top priorities are speed and achieving >>> feature parity with GNU -- we are tying to create a linker which everybody >>> wants to switch. Library design probably comes next. (And I guess if we >>> succeed on the former, the degree of the latter need raises, since more >>> people would want to use our linker.) >>> >> >> >> I remember talking with Rafael about some topics similar to what is in >> this thread, and he pointed out something that I think is very important: >> all the inputs to the linker are generated by other programs. >> So the situation for LLD being used as a library is analogous to LLVM IR >> libraries being used from clang. In general, clang knows that it just >> generated the IR and that the IR is correct (otherwise it would be a bug in >> clang), and thus it disables the verifier. >> I suspect a number of the uses of "noreturn" error handling situations >> will be pretty much in line with LLVM's traditional use of >> report_fatal_error (which is essentially the same thing), and so we won't >> need to thread ErrorOr through quite as many places as we might initially >> suspect. >> >> -- Sean Silva >> >> >>> We might need minor extensions to effectively support Arseny's use case >>>> (I think an in-process linker is a *very* reasonable thing to support, I'd >>>> even like to teach the Clang driver to optionally work that way to be more >>>> efficient on platforms like Windows). But I have to imagine that the >>>> interface for an in-process static linker and the command line linker are >>>> extremely similar if not precisely the same. >>>> >>>> At some point, it might also make sense to support more interesting >>>> linking scenarios such as linking a PIC "shared object" that can be mapped >>>> into the running process for JIT users. But I think it is reasonable to >>>> build the interface that those users need when those users are ready to >>>> leverage LLD. That way we can work with them to make sure we don't build >>>> the wrong interface or an overly complicated one (as you say). >>>> >>> >>> I can imagine that there may be a chance to support such API in future, >>> but I honestly don't know enough to say whether it makes sense or not at >>> this moment. Linking against the current process image is pretty different >>> from regular static linking, so most part of the linker is probably not >>> useful. Some part of relocation handling might be found useful, but it is >>> too early to say anything about that. We should revisit when the linker >>> become mature and an actual need arises. >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160113/9057673b/attachment.html>
On Wed, Jan 13, 2016 at 6:03 PM, Lang Hames <lhames at gmail.com> wrote:> > So the situation for LLD being used as a library is analogous to LLVM > IR libraries being used from clang. In general, clang knows that it just > generated the IR and that the IR is correct (otherwise it would be a bug in > clang), and thus it disables the verifier. > > That sounds good, but we can only usually trust our inputs. There are > still some awfully crusty object files out in the world, so we need to > verify any objects coming in from disk, and at least as it applies to > libObject that doesn't fit the current lazy error handling scheme. > > FWIW I believe Kevin Enderby has used a similar up-front object > verification scheme before in CCTools, and we may end up implementing > something like that again if we do a custom MachO class (and relegate > MachOObjectFile to a view). >I'd imagine that verifying all input object files beforehand at the start of linking would be a significant overhead since it reads all data whether it will be used or not. For example, such safeguard would read all relocations for comdat functions which will be uniquified and discarded by the linker. So if we take this pass, I guess we don't want to have the verifier as a part of the linker, but instead create that as an independent feature (probably in libObject), and we call the verifier only for object files that can be broken (e.g. read from disk instead of created by LLVM itself.)> - Lang. > > On Wed, Jan 13, 2016 at 4:41 PM, Rui Ueyama via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Having perfectly consistent object files does not necessarily mean that >> the linker is always able to link them because of various higher layer >> errors such as failing to resolve symbols. But yes, if you consider a >> compiler and the linker as one system, we can handle corrupted file as an >> internal error that should never happen, and I think most of error() calls >> are for such errors. >> >> On Wed, Jan 13, 2016 at 4:29 PM, Sean Silva <chisophugis at gmail.com> >> wrote: >> >>> >>> >>> On Thu, Jan 7, 2016 at 6:07 PM, Rui Ueyama via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> On Thu, Jan 7, 2016 at 5:12 PM, Chandler Carruth <chandlerc at gmail.com> >>>> wrote: >>>> >>>>> On Thu, Jan 7, 2016 at 4:05 PM Rui Ueyama <ruiu at google.com> wrote: >>>>> >>>>>> By organizing it as a library, I'm expecting something coarse. I >>>>>> don't expect to reorganize the linker itself as a collection of small >>>>>> libraries, but make the entire linker available as a library, so that you >>>>>> can link stuff in-process. More specifically, I expect that the library >>>>>> would basically export one function, link(std::vector<StringRef>), which >>>>>> takes command line arguments, and returns a memory buffer for a newly >>>>>> created executable. We may want to allow a mix of StringRef and >>>>>> MemoryBuffer as input, so that you can directly pass in-memory objects to >>>>>> the linker, but the basic idea remains the same. >>>>>> >>>>>> Are we on the same page? >>>>>> >>>>> >>>>> Let me answer this below, where I think you get to the core of the >>>>> problem. >>>>> >>>>> >>>>>> >>>>>> On Thu, Jan 7, 2016 at 3:44 PM, Chandler Carruth <chandlerc at gmail.com >>>>>> > wrote: >>>>>> >>>>>>> On Thu, Jan 7, 2016 at 3:18 PM Rui Ueyama <ruiu at google.com> wrote: >>>>>>> >>>>>>>> On Thu, Jan 7, 2016 at 2:56 PM, Chandler Carruth < >>>>>>>> chandlerc at gmail.com> wrote: >>>>>>>> >>>>>>>>> On Thu, Jan 7, 2016 at 7:18 AM Rui Ueyama via llvm-dev < >>>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>>> >>>>>>>>>> On Thu, Jan 7, 2016 at 7:03 AM, Arseny Kapoulkine via llvm-dev < >>>>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>>>> >>>>>>>>>>> In the process of migrating from old lld ELF linker to new >>>>>>>>>>> (previously ELF2) I noticed the interface lost several important features >>>>>>>>>>> (ordered by importance for my use case): >>>>>>>>>>> >>>>>>>>>>> 1. Detecting errors in the first place. New linker seems to call >>>>>>>>>>> exit(1) for any error. >>>>>>>>>>> >>>>>>>>>>> 2. Reporting messages to non-stderr outputs. Previously all link >>>>>>>>>>> functions had a raw_ostream argument so it was possible to delay the error >>>>>>>>>>> output, aggregate it for multiple linked files, output via a different >>>>>>>>>>> format, etc. >>>>>>>>>>> >>>>>>>>>>> 3. Linking multiple outputs in parallel (useful for test >>>>>>>>>>> drivers) in a single process. Not really an interface issue but there are >>>>>>>>>>> at least two global pointers (Config & Driver) that refer to stack >>>>>>>>>>> variables and are used in various places in the code. >>>>>>>>>>> >>>>>>>>>>> All of this seems to indicate a departure from the linker being >>>>>>>>>>> useable as a library. To maintain the previous behavior you'd have to use a >>>>>>>>>>> linker binary & popen. >>>>>>>>>>> >>>>>>>>>>> Is this a conscious design decision or a temporary limitation? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> That the new ELF and COFF linkers are designed as commands >>>>>>>>>> instead of libraries is very much an intended design change. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I disagree. >>>>>>>>> >>>>>>>>> During the discussion, there was a *specific* discussion of both >>>>>>>>> the new COFF port and ELF port continuing to be libraries with a common >>>>>>>>> command line driver. >>>>>>>>> >>>>>>>> >>>>>>>> There was a discussion that we would keep the same entry point for >>>>>>>> the old and the new, but I don't remember if I promised that we were going >>>>>>>> to organize the new linker as a library. >>>>>>>> >>>>>>> >>>>>>> Ok, myself and essentially everyone else thought this was clear. If >>>>>>> it isn't lets clarify: >>>>>>> >>>>>>> I think it is absolutely critical and important that LLD's >>>>>>> architecture remain one where all functionality is available as a library. >>>>>>> This is *the* design goal of LLVM and all of LLVM's infrastructure. This >>>>>>> applies just as much to LLD as it does to Clang. >>>>>>> >>>>>>> You say that it isn't compelling to match Clang's design, but in >>>>>>> fact it is. You would need an overwhelming argument to *diverge* from >>>>>>> Clang's design. >>>>>>> >>>>>>> The fact that it makes the design more challenging is not compelling >>>>>>> at all. Yes, building libraries that can be re-used and making the binary >>>>>>> calling it equally efficient is more challenging, but that is the express >>>>>>> mission of LLVM and every project within it. >>>>>>> >>>>>>> >>>>>>>> The new one is designed as a command from day one. (Precisely >>>>>>>> speaking, the original code propagates errors all the way up to the entry >>>>>>>> point, so you can call it and expect it to always return. Rafael introduced >>>>>>>> error() function later and we now depends on that function does not return.) >>>>>>>> >>>>>>> >>>>>>> I think this last was a mistake. >>>>>>> >>>>>>> The fact that the code propagates errors all the way up is fine, and >>>>>>> even good. We don't necessarily need to be able to *recover* from link >>>>>>> errors and try some other path. >>>>>>> >>>>>>> But we absolutely need the design to be a *library* that can be >>>>>>> embedded into other programs and tools. I can't even begin to count the use >>>>>>> cases for this. >>>>>>> >>>>>>> So please, let's go back to where we *do not* rely on >>>>>>> never-returning error handling. That is an absolute mistake. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> If you want to consider changing that, we should have a fresh (and >>>>>>>>> broad) discussion, but it goes pretty firmly against the design of the >>>>>>>>> entire LLVM project. I also don't really understand why it would be >>>>>>>>> beneficial. >>>>>>>>> >>>>>>>> >>>>>>>> I'm not against organizing it as a library as long as it does not >>>>>>>> make things too complicated >>>>>>>> >>>>>>> >>>>>>> I am certain that it will make things more complicated, but that is >>>>>>> the technical challenge that we must overcome. It will be hard, but I am >>>>>>> absolutely confident it is possible to have an elegant library design here. >>>>>>> It may not be as simple as a pure command line tool, but it will be >>>>>>> *dramatically* more powerful, general, and broadly applicable. >>>>>>> >>>>>>> The design of LLVM is not the simplest way to build a compiler. But >>>>>>> it is valuable to all of those working on it precisely because of this >>>>>>> flexibility imparted by its library oriented design. This is absolutely not >>>>>>> something that we should lose from the linker. >>>>>>> >>>>>>> >>>>>>>> , and I guess reorganizing the existing code as a library is >>>>>>>> relatively easy because it's still pretty small, but I don't really want to >>>>>>>> focus on that until it becomes usable as an alternative to GNU ld or gold. >>>>>>>> I want to focus on the linker features themselves at this moment. Once it's >>>>>>>> complete, it becomes more clear how to organize it. >>>>>>>> >>>>>>> >>>>>>> Ok, now we're talking about something totally reasonable. >>>>>>> >>>>>>> If it is easier for you all to develop this first as a command line >>>>>>> tool, and then make it work as a library, sure, go for it. You're doing the >>>>>>> work, I can hardly tell you how to go about it. ;] >>>>>>> >>>>>> >>>>>> It is not only easier for me to develop but is also super important >>>>>> for avoiding over-designing the API of the library. Until we know what we >>>>>> need to do and what can be done, it is too easy to make mistake to design >>>>>> API that is supposed to cover everything -- including hypothetical >>>>>> unrealistic ones. Such API would slow down the development speed >>>>>> significantly, and it's a pain when we abandon that when we realize that >>>>>> that was not needed. >>>>>> >>>>> >>>>> I'm very sympathetic to the problem of not wanting to design an API >>>>> until the concrete use cases for it appear. That makes perfect sense. >>>>> >>>>> We just need to be *ready* to extend the library API (and potentially >>>>> find a more fine grained layering if one is actually called for) when a >>>>> reasonable and real use case arises for some users of LLD. Once we have >>>>> people that actually have a use case and want to introduce a certain >>>>> interface to the library that supports it, we need to work with them to >>>>> figure out how to effectively support their use case. >>>>> >>>>> At the least, we clearly need the super simple interface[1] that the >>>>> command line tool would use, but an in-process linker could also probably >>>>> use. >>>>> >>>> >>>> Okay. I understood that fairly large number of people want to use the >>>> linker without starting a new process even if it just provides super simple >>>> interface which is essentially equivalent to command line options. That can >>>> be done by removing a few global variables and sprinkle ErrorOr<> in many >>>> places, so that you can call the linker's main() function from your >>>> program. That's bothersome but should not be that painful. I put it on my >>>> todo list. It's not at the top of the list, but I recognize the need and >>>> will do at some point. Current top priorities are speed and achieving >>>> feature parity with GNU -- we are tying to create a linker which everybody >>>> wants to switch. Library design probably comes next. (And I guess if we >>>> succeed on the former, the degree of the latter need raises, since more >>>> people would want to use our linker.) >>>> >>> >>> >>> I remember talking with Rafael about some topics similar to what is in >>> this thread, and he pointed out something that I think is very important: >>> all the inputs to the linker are generated by other programs. >>> So the situation for LLD being used as a library is analogous to LLVM IR >>> libraries being used from clang. In general, clang knows that it just >>> generated the IR and that the IR is correct (otherwise it would be a bug in >>> clang), and thus it disables the verifier. >>> I suspect a number of the uses of "noreturn" error handling situations >>> will be pretty much in line with LLVM's traditional use of >>> report_fatal_error (which is essentially the same thing), and so we won't >>> need to thread ErrorOr through quite as many places as we might initially >>> suspect. >>> >>> -- Sean Silva >>> >>> >>>> We might need minor extensions to effectively support Arseny's use case >>>>> (I think an in-process linker is a *very* reasonable thing to support, I'd >>>>> even like to teach the Clang driver to optionally work that way to be more >>>>> efficient on platforms like Windows). But I have to imagine that the >>>>> interface for an in-process static linker and the command line linker are >>>>> extremely similar if not precisely the same. >>>>> >>>>> At some point, it might also make sense to support more interesting >>>>> linking scenarios such as linking a PIC "shared object" that can be mapped >>>>> into the running process for JIT users. But I think it is reasonable to >>>>> build the interface that those users need when those users are ready to >>>>> leverage LLD. That way we can work with them to make sure we don't build >>>>> the wrong interface or an overly complicated one (as you say). >>>>> >>>> >>>> I can imagine that there may be a chance to support such API in future, >>>> but I honestly don't know enough to say whether it makes sense or not at >>>> this moment. Linking against the current process image is pretty different >>>> from regular static linking, so most part of the linker is probably not >>>> useful. Some part of relocation handling might be found useful, but it is >>>> too early to say anything about that. We should revisit when the linker >>>> become mature and an actual need arises. >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160113/a92de222/attachment-0001.html>
Hi Rui, Yep. "Verifying up front" is an oversimplification. I don't know the details of Kevin's original scheme in CCTools, but I'm imagining that we would provide coarse-grained verification for different portions of the object file (e.g. verification of all load commands, verification of symbol tables, verification of relocations for a section), while providing accessors that assume a well-formed data structure. If/when verification gets run would be up to the tools that use the class. For the linker you could defer most of the verification until you know an object file will actually be used. If the linker is consuming trusted input from the compiler you might forgo verification altogether (that's analogous to how we treat IR, as Sean pointed out). On the other hand, something like llvm-objdump might run all verification up front, since it's not time-sensitive, and it's nice to be able warned loudly about malformed objects even if the portion you asked about wasn't malformed. As for the impact on library design: Any library interface that can't assume good input is going to have some sort of error return. None of this affects that fundamental requirement, but it might change the granularity of error-checking within the library. Cheers, Lang. On Wed, Jan 13, 2016 at 7:07 PM, Rui Ueyama <ruiu at google.com> wrote:> On Wed, Jan 13, 2016 at 6:03 PM, Lang Hames <lhames at gmail.com> wrote: > >> > So the situation for LLD being used as a library is analogous to LLVM >> IR libraries being used from clang. In general, clang knows that it just >> generated the IR and that the IR is correct (otherwise it would be a bug in >> clang), and thus it disables the verifier. >> >> That sounds good, but we can only usually trust our inputs. There are >> still some awfully crusty object files out in the world, so we need to >> verify any objects coming in from disk, and at least as it applies to >> libObject that doesn't fit the current lazy error handling scheme. >> >> FWIW I believe Kevin Enderby has used a similar up-front object >> verification scheme before in CCTools, and we may end up implementing >> something like that again if we do a custom MachO class (and relegate >> MachOObjectFile to a view). >> > > I'd imagine that verifying all input object files beforehand at the start > of linking would be a significant overhead since it reads all data whether > it will be used or not. For example, such safeguard would read all > relocations for comdat functions which will be uniquified and discarded by > the linker. So if we take this pass, I guess we don't want to have the > verifier as a part of the linker, but instead create that as an independent > feature (probably in libObject), and we call the verifier only for object > files that can be broken (e.g. read from disk instead of created by LLVM > itself.) > > >> - Lang. >> >> On Wed, Jan 13, 2016 at 4:41 PM, Rui Ueyama via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Having perfectly consistent object files does not necessarily mean that >>> the linker is always able to link them because of various higher layer >>> errors such as failing to resolve symbols. But yes, if you consider a >>> compiler and the linker as one system, we can handle corrupted file as an >>> internal error that should never happen, and I think most of error() calls >>> are for such errors. >>> >>> On Wed, Jan 13, 2016 at 4:29 PM, Sean Silva <chisophugis at gmail.com> >>> wrote: >>> >>>> >>>> >>>> On Thu, Jan 7, 2016 at 6:07 PM, Rui Ueyama via llvm-dev < >>>> llvm-dev at lists.llvm.org> wrote: >>>> >>>>> On Thu, Jan 7, 2016 at 5:12 PM, Chandler Carruth <chandlerc at gmail.com> >>>>> wrote: >>>>> >>>>>> On Thu, Jan 7, 2016 at 4:05 PM Rui Ueyama <ruiu at google.com> wrote: >>>>>> >>>>>>> By organizing it as a library, I'm expecting something coarse. I >>>>>>> don't expect to reorganize the linker itself as a collection of small >>>>>>> libraries, but make the entire linker available as a library, so that you >>>>>>> can link stuff in-process. More specifically, I expect that the library >>>>>>> would basically export one function, link(std::vector<StringRef>), which >>>>>>> takes command line arguments, and returns a memory buffer for a newly >>>>>>> created executable. We may want to allow a mix of StringRef and >>>>>>> MemoryBuffer as input, so that you can directly pass in-memory objects to >>>>>>> the linker, but the basic idea remains the same. >>>>>>> >>>>>>> Are we on the same page? >>>>>>> >>>>>> >>>>>> Let me answer this below, where I think you get to the core of the >>>>>> problem. >>>>>> >>>>>> >>>>>>> >>>>>>> On Thu, Jan 7, 2016 at 3:44 PM, Chandler Carruth < >>>>>>> chandlerc at gmail.com> wrote: >>>>>>> >>>>>>>> On Thu, Jan 7, 2016 at 3:18 PM Rui Ueyama <ruiu at google.com> wrote: >>>>>>>> >>>>>>>>> On Thu, Jan 7, 2016 at 2:56 PM, Chandler Carruth < >>>>>>>>> chandlerc at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> On Thu, Jan 7, 2016 at 7:18 AM Rui Ueyama via llvm-dev < >>>>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>>>> >>>>>>>>>>> On Thu, Jan 7, 2016 at 7:03 AM, Arseny Kapoulkine via llvm-dev < >>>>>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>>>>> >>>>>>>>>>>> In the process of migrating from old lld ELF linker to new >>>>>>>>>>>> (previously ELF2) I noticed the interface lost several important features >>>>>>>>>>>> (ordered by importance for my use case): >>>>>>>>>>>> >>>>>>>>>>>> 1. Detecting errors in the first place. New linker seems to >>>>>>>>>>>> call exit(1) for any error. >>>>>>>>>>>> >>>>>>>>>>>> 2. Reporting messages to non-stderr outputs. Previously all >>>>>>>>>>>> link functions had a raw_ostream argument so it was possible to delay the >>>>>>>>>>>> error output, aggregate it for multiple linked files, output via a >>>>>>>>>>>> different format, etc. >>>>>>>>>>>> >>>>>>>>>>>> 3. Linking multiple outputs in parallel (useful for test >>>>>>>>>>>> drivers) in a single process. Not really an interface issue but there are >>>>>>>>>>>> at least two global pointers (Config & Driver) that refer to stack >>>>>>>>>>>> variables and are used in various places in the code. >>>>>>>>>>>> >>>>>>>>>>>> All of this seems to indicate a departure from the linker being >>>>>>>>>>>> useable as a library. To maintain the previous behavior you'd have to use a >>>>>>>>>>>> linker binary & popen. >>>>>>>>>>>> >>>>>>>>>>>> Is this a conscious design decision or a temporary limitation? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> That the new ELF and COFF linkers are designed as commands >>>>>>>>>>> instead of libraries is very much an intended design change. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I disagree. >>>>>>>>>> >>>>>>>>>> During the discussion, there was a *specific* discussion of both >>>>>>>>>> the new COFF port and ELF port continuing to be libraries with a common >>>>>>>>>> command line driver. >>>>>>>>>> >>>>>>>>> >>>>>>>>> There was a discussion that we would keep the same entry point for >>>>>>>>> the old and the new, but I don't remember if I promised that we were going >>>>>>>>> to organize the new linker as a library. >>>>>>>>> >>>>>>>> >>>>>>>> Ok, myself and essentially everyone else thought this was clear. If >>>>>>>> it isn't lets clarify: >>>>>>>> >>>>>>>> I think it is absolutely critical and important that LLD's >>>>>>>> architecture remain one where all functionality is available as a library. >>>>>>>> This is *the* design goal of LLVM and all of LLVM's infrastructure. This >>>>>>>> applies just as much to LLD as it does to Clang. >>>>>>>> >>>>>>>> You say that it isn't compelling to match Clang's design, but in >>>>>>>> fact it is. You would need an overwhelming argument to *diverge* from >>>>>>>> Clang's design. >>>>>>>> >>>>>>>> The fact that it makes the design more challenging is not >>>>>>>> compelling at all. Yes, building libraries that can be re-used and making >>>>>>>> the binary calling it equally efficient is more challenging, but that is >>>>>>>> the express mission of LLVM and every project within it. >>>>>>>> >>>>>>>> >>>>>>>>> The new one is designed as a command from day one. (Precisely >>>>>>>>> speaking, the original code propagates errors all the way up to the entry >>>>>>>>> point, so you can call it and expect it to always return. Rafael introduced >>>>>>>>> error() function later and we now depends on that function does not return.) >>>>>>>>> >>>>>>>> >>>>>>>> I think this last was a mistake. >>>>>>>> >>>>>>>> The fact that the code propagates errors all the way up is fine, >>>>>>>> and even good. We don't necessarily need to be able to *recover* from link >>>>>>>> errors and try some other path. >>>>>>>> >>>>>>>> But we absolutely need the design to be a *library* that can be >>>>>>>> embedded into other programs and tools. I can't even begin to count the use >>>>>>>> cases for this. >>>>>>>> >>>>>>>> So please, let's go back to where we *do not* rely on >>>>>>>> never-returning error handling. That is an absolute mistake. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> If you want to consider changing that, we should have a fresh (and >>>>>>>>>> broad) discussion, but it goes pretty firmly against the design of the >>>>>>>>>> entire LLVM project. I also don't really understand why it would be >>>>>>>>>> beneficial. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I'm not against organizing it as a library as long as it does not >>>>>>>>> make things too complicated >>>>>>>>> >>>>>>>> >>>>>>>> I am certain that it will make things more complicated, but that is >>>>>>>> the technical challenge that we must overcome. It will be hard, but I am >>>>>>>> absolutely confident it is possible to have an elegant library design here. >>>>>>>> It may not be as simple as a pure command line tool, but it will be >>>>>>>> *dramatically* more powerful, general, and broadly applicable. >>>>>>>> >>>>>>>> The design of LLVM is not the simplest way to build a compiler. But >>>>>>>> it is valuable to all of those working on it precisely because of this >>>>>>>> flexibility imparted by its library oriented design. This is absolutely not >>>>>>>> something that we should lose from the linker. >>>>>>>> >>>>>>>> >>>>>>>>> , and I guess reorganizing the existing code as a library is >>>>>>>>> relatively easy because it's still pretty small, but I don't really want to >>>>>>>>> focus on that until it becomes usable as an alternative to GNU ld or gold. >>>>>>>>> I want to focus on the linker features themselves at this moment. Once it's >>>>>>>>> complete, it becomes more clear how to organize it. >>>>>>>>> >>>>>>>> >>>>>>>> Ok, now we're talking about something totally reasonable. >>>>>>>> >>>>>>>> If it is easier for you all to develop this first as a command line >>>>>>>> tool, and then make it work as a library, sure, go for it. You're doing the >>>>>>>> work, I can hardly tell you how to go about it. ;] >>>>>>>> >>>>>>> >>>>>>> It is not only easier for me to develop but is also super important >>>>>>> for avoiding over-designing the API of the library. Until we know what we >>>>>>> need to do and what can be done, it is too easy to make mistake to design >>>>>>> API that is supposed to cover everything -- including hypothetical >>>>>>> unrealistic ones. Such API would slow down the development speed >>>>>>> significantly, and it's a pain when we abandon that when we realize that >>>>>>> that was not needed. >>>>>>> >>>>>> >>>>>> I'm very sympathetic to the problem of not wanting to design an API >>>>>> until the concrete use cases for it appear. That makes perfect sense. >>>>>> >>>>>> We just need to be *ready* to extend the library API (and potentially >>>>>> find a more fine grained layering if one is actually called for) when a >>>>>> reasonable and real use case arises for some users of LLD. Once we have >>>>>> people that actually have a use case and want to introduce a certain >>>>>> interface to the library that supports it, we need to work with them to >>>>>> figure out how to effectively support their use case. >>>>>> >>>>>> At the least, we clearly need the super simple interface[1] that the >>>>>> command line tool would use, but an in-process linker could also probably >>>>>> use. >>>>>> >>>>> >>>>> Okay. I understood that fairly large number of people want to use the >>>>> linker without starting a new process even if it just provides super simple >>>>> interface which is essentially equivalent to command line options. That can >>>>> be done by removing a few global variables and sprinkle ErrorOr<> in many >>>>> places, so that you can call the linker's main() function from your >>>>> program. That's bothersome but should not be that painful. I put it on my >>>>> todo list. It's not at the top of the list, but I recognize the need and >>>>> will do at some point. Current top priorities are speed and achieving >>>>> feature parity with GNU -- we are tying to create a linker which everybody >>>>> wants to switch. Library design probably comes next. (And I guess if we >>>>> succeed on the former, the degree of the latter need raises, since more >>>>> people would want to use our linker.) >>>>> >>>> >>>> >>>> I remember talking with Rafael about some topics similar to what is in >>>> this thread, and he pointed out something that I think is very important: >>>> all the inputs to the linker are generated by other programs. >>>> So the situation for LLD being used as a library is analogous to LLVM >>>> IR libraries being used from clang. In general, clang knows that it just >>>> generated the IR and that the IR is correct (otherwise it would be a bug in >>>> clang), and thus it disables the verifier. >>>> I suspect a number of the uses of "noreturn" error handling situations >>>> will be pretty much in line with LLVM's traditional use of >>>> report_fatal_error (which is essentially the same thing), and so we won't >>>> need to thread ErrorOr through quite as many places as we might initially >>>> suspect. >>>> >>>> -- Sean Silva >>>> >>>> >>>>> We might need minor extensions to effectively support Arseny's use >>>>>> case (I think an in-process linker is a *very* reasonable thing to support, >>>>>> I'd even like to teach the Clang driver to optionally work that way to be >>>>>> more efficient on platforms like Windows). But I have to imagine that the >>>>>> interface for an in-process static linker and the command line linker are >>>>>> extremely similar if not precisely the same. >>>>>> >>>>>> At some point, it might also make sense to support more interesting >>>>>> linking scenarios such as linking a PIC "shared object" that can be mapped >>>>>> into the running process for JIT users. But I think it is reasonable to >>>>>> build the interface that those users need when those users are ready to >>>>>> leverage LLD. That way we can work with them to make sure we don't build >>>>>> the wrong interface or an overly complicated one (as you say). >>>>>> >>>>> >>>>> I can imagine that there may be a chance to support such API in >>>>> future, but I honestly don't know enough to say whether it makes sense or >>>>> not at this moment. Linking against the current process image is pretty >>>>> different from regular static linking, so most part of the linker is >>>>> probably not useful. Some part of relocation handling might be found >>>>> useful, but it is too early to say anything about that. We should revisit >>>>> when the linker become mature and an actual need arises. >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160113/b9f5aa7a/attachment.html>
Rafael EspĂndola via llvm-dev
2016-Jan-20 15:30 UTC
[llvm-dev] lld: ELF/COFF main() interface
Sorry for being late on this thread. I just wanted to say I am strongly on Rui's side on this one. There current design is for lld *not* not be a library and I think that is important. That has saved us a tremendous amount of work for doing library like code and a lot of design for library interfaces. The comparison of old and new ELF code is night and day as far as productivity and performance are concerned. Designing right now would be premature because it is not clear what commonalities there will be on how to refactor them. For example, both MCJIT and lld apply relocations, but there are tremendously different options on how to factor this * Have MC produce position dependent code and MCJIT would be a bit more like other jits and not need relocations. * Move relocation processing to LLVM somewhere and have lld and MCJIT use it. * Have MC produce shared objects directly, saving MCJIT the complication of using relocatable objects. * Have MCJIT use lld as trivial library that implements "ld foo.o -o foo.so -shared". The situation is even less clear for the other parts we are missing in llvm: objcopy, readelf, etc. We have to discuss and prototype these before we can make a decision. Committing now would be premature design and stall the progress on one thing we are sure we need: A high quality, bsd license linker. Lets get that implemented. While that MCJIT will move along and we will be in a position to productively discuss what can be shared and at what cost (complexity and performance). Last but not least, anything that is not needed in two different areas should remain application code. The only point of paying the complexity of writing a library is if it is used. Cheers, Rafael