thr3ads.net - llvm dev - [llvm-dev] RFC: ELF Autolinking [Mar 2019]

If this information is useful, please help other people find it:
Share via:

bd1976 llvm via llvm-dev

2019-Mar-14 16:44 UTC

[llvm-dev] RFC: ELF Autolinking

On Thu, Mar 14, 2019 at 3:32 PM Peter Smith <peter.smith at linaro.org>
wrote:
> Hello,
>
> I've put some comments on the proposal inline. Having to had to debug
> library selection problems where all the libraries are visible on the
> linker command line, I would prefer if people didn't embed difficult
> to find directives in object files, but I'm guessing in some languages
> this is the natural way of adding libraries.
>
> On Thu, 14 Mar 2019 at 13:08, bd1976 llvm via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > At Sony we offer autolinking as a feature in our ELF toolchain. We
would
> like to see full support for this feature upstream as there is anecdotal
> evidence that it would find use beyond Sony.
> >
>
> I've not got any use of the existing code. Personally I've not come
> across anyone wanting this type of feature, but that is also anecdotal
> on my part.
>
> >
> > For ELF we need limited autolinking support. Specifically, we only
need
> support for "comment lib" pragmas (
>
https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017)
> in C/C++ e.g. #pragma comment(lib, "foo"). My suggestion that we
keep the
> implementation as lean as possible.
> >
> > Principles to guide the implementation:
> > - Developers should be able to easily understand autolinking behavior.
> > - Developers should be able to override autolinking from the linker
> command line.
> > - Inputs specified via pragmas should be handled in a general way to
> allow the same source code to work in different environments.
> >
> > I would like to propose that we focus on autolinking exclusively and
> that we divorce the implementation from the idea of "linker
options" which,
> by nature, would tie source code to the vagaries of particular linkers. I
> don't see much value in supporting other linker operations so I suggest
> that the binary representation be a mergable string section (SHF_MERGE,
> SHF_STRINGS), called .autolink, with custom type SHT_LLVM_AUTOLINK
> (0x6fff4c04), and SHF_EXCLUDE set (to avoid the contents appearing in the
> output). The compiler can form this section by concatenating the arguments
> of the "comment lib" pragmas in the order they are encountered.
Partial
> (-r, -Ur) links can be handled by concatenating .autolink sections with the
> normal mergeable string section rules. The current .linker-options can
> remain (or be removed); but, "comment lib" pragmas for ELF should
be
> lowered to .autolink not to .linker-options. This makes sense as there is
> no linker option that "comment lib" pragmas map directly to. As
an example,
> #pragma comment(lib, "foo") would result in:
> >
> > .section ".autolink","eMS", at llvm_autolink,1
> >         .asciz "foo"
> >
> > For LTO, equivalent information to the contents of a the .autolink
> section will be written to the IRSymtab so that it is available to the
> linker for symbol resolution.
> >
>
> I'm not sure I understand the bit about "for symbol
resolution". I
> think that what you mean is that you will encode the autolink section
> using symbols instead of as a section, and the linker is expected to
> extract this when it reads the symbol table?
>
>Whoops... might have used a bit of a colloquialism there; sorry. All I mean
is that there will be a method on the IRSymtab that LLD can use to retrieve
the same set of strings that would be written into the the .autolink
section of the relocatable object files by the backend.

> > The linker will process the .autolink strings in the following way:
> >
> > 1. Inputs from the .autolink sections of a relocatable object file are
> added when the linker decides to include that file (which could itself be
> in a library) in the link. Autolinked inputs behave as if they were
> appended to the command line as a group after all other options. As a
> consequence the set of autolinked libraries are searched last to resolve
> symbols.
> > 2. It is an error if a file cannot be found for a given string.
> > 3. Any command line options in effect at the end of the command line
> parsing apply to autolinked inputs, e.g. --whole-archive.
>
> I've not got any experience of autolinking as a user, so I'm
> struggling a bit with this one. I'm guessing that autolinking is
> useful because someone can do the equivalent of #include <library.h>
> and #pragma comment lib "library.so" in the same place without
having
> to fight the build system.

Right. Consider that many codebases have multiple build configurations and
the linker needs to be given the correct version of a library to use for
the particular build configuration. This is often easier to do using the
preprocessor than in the build system. Also, if a program is dependent on
an external library, autolinking allows the library writer to reorganize
how that library is structured transparently to the users of the library.
There are notes about utility in
https://stackoverflow.com/questions/1685206/pragma-commentlib-xxx-lib-equivalent-under-linux
and
https://stackoverflow.com/questions/3851956/whats-pragma-comment-lib-lib-glut32-lib?noredirect=1&lq=1
.

> I'm less convinced about --whole-archive as
> I think this tends to be a way of structuring the build and would be
> best made explicit in the build system. Moreover, what if someone
> wants to not use --whole-archive, for their autolink, but one already
> exists.

Then they can specify --no-whole-archive on the end of the command line, no?

> This could be quite difficult to check with a large project.
> Personally I'd have the user be explicit in the .autolink whether they
> were intending it to be whole-archive or not.
>
I was hoping to avoid this as I want to avoid getting into how to specify
linker specific options in the frontend. If we dislike the idea that the
state of the command line parser at the end of the linker command line
affects the autolinked libraries then I would rather go for a scheme in
which the default state of the command line parser applies when linking the
autolinked libraries; however, that seems harder to implement in LLD and
gives the user less control over autolinking.

>
> > 4. Duplicate autolinked inputs are ignored.
>
> If we take the issue of --whole-archive off the table does it matter
> that there are duplicate libraries? Unresolved symbols will match
> against the first library.

It doesn't matter for libraries in LLD; but, it is important for object
files. I think that this mechanism should be usable for object files an
libraries. This is common in ELF linkers - for example the --library
command line option can be used to link object files.

> I guess it might make a difference if this
> feature is implemented in ld.lld and ld.gold, where you'd have to wrap
> the libraries in a start-group, end-group, but is this likely to
> happen?
>
I would like the design to be such that it could be implemented by GNU.

>
> > 5. The linker tries to add a library or relocatable object file from
> each of the strings in a .autolink section by; first, handling the string
> as if it was specified on the commandline; second, by looking for the
> string in each of the library search paths in turn; third, by looking for a
> lib<string>.a or lib<string>.so (depending on the current mode
of the
> linker) in each of the library search paths.
>
> There is some precedent for including files and libraries from
> linkerscripts
> https://sourceware.org/binutils/docs/ld/File-Commands.html#File-Commands
> , these distinguish between "-lfile" and "file". Would
this be a
> better fit for a ld.bfd interface compatible linker?
>
>I was hoping to avoid GNUism's and use a "general" mechanism. MSVC
source
code compatibility is a usecase.

> > 6. A new command line option --no-llvm-autolink will tell LLD to
ignore
> the .autolink sections.
>
> Personally I would have thought --no-llvm-autolink would error if it
> found a .autolink section, on the grounds that I wanted all the
> libraries to be defined on the command-line or linker script rather
> than hidden in object files. I would have thought ignoring the
> autolink sections would in most cases result in undefined symbols. If
> there is a use case for it, perhaps --ignore-llvm-autolink.
>
>The usecase that I had in mind is that you need to override autolinking. To
do so you tell the linker to ignore the embedded autolinking information
and construct an equivalent command line. I think your proposed
--ignore-llvm-autolink is a better name for this option given the intended
semantics.

> > Rationale for the above points:
> >
> > 1. Adding the autolinked inputs last makes the process simple to
> understand from a developers perspective. All linkers are able to implement
> this scheme.
> > 2. Error-ing for libraries that are not found seems like better
behavior
> than failing the link during symbol resolution.
> > 3. It seems useful for the user to be able to apply command line
options
> which will affect all of the autolinked input files. There is a potential
> problem of surprise for developers, who might not realize that these
> options would apply to the "invisible" autolinked input files;
however,
> despite the potential for surprise, this is easy for developers to reason
> about and gives developers the control that they may require.
> > 4. Unlike on the command line it is probably easy to include the same
> input file twice via pragmas and might be a pain to fix; think of
> Third-party libraries supplied as binaries.
> > 5. This algorithm takes into account all of the different ways that
ELF
> linkers find input files. The different search methods are tried by the
> linker in most obvious to least obvious order.
> > 6. I considered adding finer grained control over which .autolink
inputs
> were ignored (e.g. MSVC has /nodefaultlib:<library>); however, I
concluded
> that this is not necessary: if finer control is required developers can
> recreate the same effect autolinking would have had using command line
> options.
> >
> > Thoughts?
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190314/bf16cecb/attachment.html>

Rui Ueyama via llvm-dev

2019-Mar-14 17:57 UTC

head link

[llvm-dev] RFC: ELF Autolinking

On Thu, Mar 14, 2019 at 9:45 AM bd1976 llvm via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Thu, Mar 14, 2019 at 3:32 PM Peter Smith <peter.smith at
linaro.org>
> wrote:
>
>> Hello,
>>
>> I've put some comments on the proposal inline. Having to had to
debug
>> library selection problems where all the libraries are visible on the
>> linker command line, I would prefer if people didn't embed
difficult
>> to find directives in object files, but I'm guessing in some
languages
>> this is the natural way of adding libraries.
>>
>> On Thu, 14 Mar 2019 at 13:08, bd1976 llvm via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> >
>> > At Sony we offer autolinking as a feature in our ELF toolchain. We
>> would like to see full support for this feature upstream as there is
>> anecdotal evidence that it would find use beyond Sony.
>> >
>>
>> I've not got any use of the existing code. Personally I've not
come
>> across anyone wanting this type of feature, but that is also anecdotal
>> on my part.
>>
>> >
>> > For ELF we need limited autolinking support. Specifically, we only
need
>> support for "comment lib" pragmas (
>>
https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017)
>> in C/C++ e.g. #pragma comment(lib, "foo"). My suggestion that
we keep the
>> implementation as lean as possible.
>> >
>> > Principles to guide the implementation:
>> > - Developers should be able to easily understand autolinking
behavior.
>> > - Developers should be able to override autolinking from the
linker
>> command line.
>> > - Inputs specified via pragmas should be handled in a general way
to
>> allow the same source code to work in different environments.
>> >
>> > I would like to propose that we focus on autolinking exclusively
and
>> that we divorce the implementation from the idea of "linker
options" which,
>> by nature, would tie source code to the vagaries of particular linkers.
I
>> don't see much value in supporting other linker operations so I
suggest
>> that the binary representation be a mergable string section (SHF_MERGE,
>> SHF_STRINGS), called .autolink, with custom type SHT_LLVM_AUTOLINK
>> (0x6fff4c04), and SHF_EXCLUDE set (to avoid the contents appearing in
the
>> output). The compiler can form this section by concatenating the
arguments
>> of the "comment lib" pragmas in the order they are
encountered. Partial
>> (-r, -Ur) links can be handled by concatenating .autolink sections with
the
>> normal mergeable string section rules. The current .linker-options can
>> remain (or be removed); but, "comment lib" pragmas for ELF
should be
>> lowered to .autolink not to .linker-options. This makes sense as there
is
>> no linker option that "comment lib" pragmas map directly to.
As an example,
>> #pragma comment(lib, "foo") would result in:
>> >
>> > .section ".autolink","eMS", at llvm_autolink,1
>> >         .asciz "foo"
>> >
>> > For LTO, equivalent information to the contents of a the .autolink
>> section will be written to the IRSymtab so that it is available to the
>> linker for symbol resolution.
>> >
>>
>> I'm not sure I understand the bit about "for symbol
resolution". I
>> think that what you mean is that you will encode the autolink section
>> using symbols instead of as a section, and the linker is expected to
>> extract this when it reads the symbol table?
>>
>>
> Whoops... might have used a bit of a colloquialism there; sorry. All I
> mean is that there will be a method on the IRSymtab that LLD can use to
> retrieve the same set of strings that would be written into the the
> .autolink section of the relocatable object files by the backend.
>
>
>> > The linker will process the .autolink strings in the following
way:
>> >
>> > 1. Inputs from the .autolink sections of a relocatable object file
are
>> added when the linker decides to include that file (which could itself
be
>> in a library) in the link. Autolinked inputs behave as if they were
>> appended to the command line as a group after all other options. As a
>> consequence the set of autolinked libraries are searched last to
resolve
>> symbols.
>> > 2. It is an error if a file cannot be found for a given string.
>> > 3. Any command line options in effect at the end of the command
line
>> parsing apply to autolinked inputs, e.g. --whole-archive.
>>
>> I've not got any experience of autolinking as a user, so I'm
>> struggling a bit with this one. I'm guessing that autolinking is
>> useful because someone can do the equivalent of #include
<library.h>
>> and #pragma comment lib "library.so" in the same place
without having
>> to fight the build system.
>
>
> Right. Consider that many codebases have multiple build configurations and
> the linker needs to be given the correct version of a library to use for
> the particular build configuration. This is often easier to do using the
> preprocessor than in the build system. Also, if a program is dependent on
> an external library, autolinking allows the library writer to reorganize
> how that library is structured transparently to the users of the library.
> There are notes about utility in
>
https://stackoverflow.com/questions/1685206/pragma-commentlib-xxx-lib-equivalent-under-linux
> and
>
https://stackoverflow.com/questions/3851956/whats-pragma-comment-lib-lib-glut32-lib?noredirect=1&lq=1
> .
>
>
>> I'm less convinced about --whole-archive as
>> I think this tends to be a way of structuring the build and would be
>> best made explicit in the build system. Moreover, what if someone
>> wants to not use --whole-archive, for their autolink, but one already
>> exists.
>
>
> Then they can specify --no-whole-archive on the end of the command line,
> no?
>
>
>> This could be quite difficult to check with a large project.
>> Personally I'd have the user be explicit in the .autolink whether
they
>> were intending it to be whole-archive or not.
>>
>
> I was hoping to avoid this as I want to avoid getting into how to specify
> linker specific options in the frontend. If we dislike the idea that the
> state of the command line parser at the end of the linker command line
> affects the autolinked libraries then I would rather go for a scheme in
> which the default state of the command line parser applies when linking the
> autolinked libraries; however, that seems harder to implement in LLD and
> gives the user less control over autolinking.
>
I think that handling .autolink'ed files in the default state is simpler,
and it doesn't seem too hard to implement.

The other option is to handle autolinked libraries as soon as we find them,
so that if foo.o autolinks libbar, the linker would act as if foo.o in the
command line is followed by -lbar. I'd think that's not too bad or
arguably
more straightforward semantics than autolinking everything all at once at
the end.

>> > 4. Duplicate autolinked inputs are ignored.
>>
>> If we take the issue of --whole-archive off the table does it matter
>> that there are duplicate libraries? Unresolved symbols will match
>> against the first library.
>
>
> It doesn't matter for libraries in LLD; but, it is important for object
> files. I think that this mechanism should be usable for object files an
> libraries. This is common in ELF linkers - for example the --library
> command line option can be used to link object files.
>
>>Do you actually often link .o file using -l? It seems a bit weird use of
the option. To me, it seems better to limit the ability of autolinking to
link against .so or .a.

I guess it might make a difference if this>> feature is implemented in ld.lld and ld.gold, where you'd have to
wrap
>> the libraries in a start-group, end-group, but is this likely to
>> happen?
>>
>
> I would like the design to be such that it could be implemented by GNU.
>
>
>>
>> > 5. The linker tries to add a library or relocatable object file
from
>> each of the strings in a .autolink section by; first, handling the
string
>> as if it was specified on the commandline; second, by looking for the
>> string in each of the library search paths in turn; third, by looking
for a
>> lib<string>.a or lib<string>.so (depending on the current
mode of the
>> linker) in each of the library search paths.
>>
>> There is some precedent for including files and libraries from
>> linkerscripts
>>
https://sourceware.org/binutils/docs/ld/File-Commands.html#File-Commands
>> , these distinguish between "-lfile" and "file".
Would this be a
>> better fit for a ld.bfd interface compatible linker?
>>
>>
> I was hoping to avoid GNUism's and use a "general" mechanism.
MSVC source
> code compatibility is a usecase.
>
>
>> > 6. A new command line option --no-llvm-autolink will tell LLD to
ignore
>> the .autolink sections.
>>
>> Personally I would have thought --no-llvm-autolink would error if it
>> found a .autolink section, on the grounds that I wanted all the
>> libraries to be defined on the command-line or linker script rather
>> than hidden in object files. I would have thought ignoring the
>> autolink sections would in most cases result in undefined symbols. If
>> there is a use case for it, perhaps --ignore-llvm-autolink.
>>
>>
> The usecase that I had in mind is that you need to override autolinking.
> To do so you tell the linker to ignore the embedded autolinking information
> and construct an equivalent command line. I think your proposed
> --ignore-llvm-autolink is a better name for this option given the intended
> semantics.
>
>
>> > Rationale for the above points:
>> >
>> > 1. Adding the autolinked inputs last makes the process simple to
>> understand from a developers perspective. All linkers are able to
implement
>> this scheme.
>> > 2. Error-ing for libraries that are not found seems like better
>> behavior than failing the link during symbol resolution.
>> > 3. It seems useful for the user to be able to apply command line
>> options which will affect all of the autolinked input files. There is a
>> potential problem of surprise for developers, who might not realize
that
>> these options would apply to the "invisible" autolinked input
files;
>> however, despite the potential for surprise, this is easy for
developers to
>> reason about and gives developers the control that they may require.
>> > 4. Unlike on the command line it is probably easy to include the
same
>> input file twice via pragmas and might be a pain to fix; think of
>> Third-party libraries supplied as binaries.
>> > 5. This algorithm takes into account all of the different ways
that ELF
>> linkers find input files. The different search methods are tried by the
>> linker in most obvious to least obvious order.
>> > 6. I considered adding finer grained control over which .autolink
>> inputs were ignored (e.g. MSVC has /nodefaultlib:<library>);
however, I
>> concluded that this is not necessary: if finer control is required
>> developers can recreate the same effect autolinking would have had
using
>> command line options.
>> >
>> > Thoughts?
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190314/56f0ce90/attachment.html>

Peter Smith via llvm-dev

2019-Mar-14 18:24 UTC

head link

[llvm-dev] RFC: ELF Autolinking

On Thu, 14 Mar 2019 at 17:58, Rui Ueyama <ruiu at google.com>
wrote:>
> On Thu, Mar 14, 2019 at 9:45 AM bd1976 llvm via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>> On Thu, Mar 14, 2019 at 3:32 PM Peter Smith <peter.smith at
linaro.org> wrote:
>>>
>>> Hello,
>>>
>>> I've put some comments on the proposal inline. Having to had to
debug
>>> library selection problems where all the libraries are visible on
the
>>> linker command line, I would prefer if people didn't embed
difficult
>>> to find directives in object files, but I'm guessing in some
languages
>>> this is the natural way of adding libraries.
>>>
>>> On Thu, 14 Mar 2019 at 13:08, bd1976 llvm via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> >
>>> > At Sony we offer autolinking as a feature in our ELF
toolchain. We would like to see full support for this feature upstream as there
is anecdotal evidence that it would find use beyond Sony.
>>> >
>>>
>>> I've not got any use of the existing code. Personally I've
not come
>>> across anyone wanting this type of feature, but that is also
anecdotal
>>> on my part.
>>>
>>> >
>>> > For ELF we need limited autolinking support. Specifically, we
only need support for "comment lib" pragmas
(https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017)
in C/C++ e.g. #pragma comment(lib, "foo"). My suggestion that we keep
the implementation as lean as possible.
>>> >
>>> > Principles to guide the implementation:
>>> > - Developers should be able to easily understand autolinking
behavior.
>>> > - Developers should be able to override autolinking from the
linker command line.
>>> > - Inputs specified via pragmas should be handled in a general
way to allow the same source code to work in different environments.
>>> >
>>> > I would like to propose that we focus on autolinking
exclusively and that we divorce the implementation from the idea of "linker
options" which, by nature, would tie source code to the vagaries of
particular linkers. I don't see much value in supporting other linker
operations so I suggest that the binary representation be a mergable string
section (SHF_MERGE, SHF_STRINGS), called .autolink, with custom type
SHT_LLVM_AUTOLINK (0x6fff4c04), and SHF_EXCLUDE set (to avoid the contents
appearing in the output). The compiler can form this section by concatenating
the arguments of the "comment lib" pragmas in the order they are
encountered. Partial (-r, -Ur) links can be handled by concatenating .autolink
sections with the normal mergeable string section rules. The current
.linker-options can remain (or be removed); but, "comment lib" pragmas
for ELF should be lowered to .autolink not to .linker-options. This makes sense
as there is no linker option that "comment lib" pragmas map directly
to. As an example, #pragma comment(lib, "foo") would result in:
>>> >
>>> > .section ".autolink","eMS", at
llvm_autolink,1
>>> >         .asciz "foo"
>>> >
>>> > For LTO, equivalent information to the contents of a the
.autolink section will be written to the IRSymtab so that it is available to the
linker for symbol resolution.
>>> >
>>>
>>> I'm not sure I understand the bit about "for symbol
resolution". I
>>> think that what you mean is that you will encode the autolink
section
>>> using symbols instead of as a section, and the linker is expected
to
>>> extract this when it reads the symbol table?
>>>
>>
>> Whoops... might have used a bit of a colloquialism there; sorry. All I
mean is that there will be a method on the IRSymtab that LLD can use to retrieve
the same set of strings that would be written into the the .autolink section of
the relocatable object files by the backend.
>>
>>>
>>> > The linker will process the .autolink strings in the following
way:
>>> >
>>> > 1. Inputs from the .autolink sections of a relocatable object
file are added when the linker decides to include that file (which could itself
be in a library) in the link. Autolinked inputs behave as if they were appended
to the command line as a group after all other options. As a consequence the set
of autolinked libraries are searched last to resolve symbols.
>>> > 2. It is an error if a file cannot be found for a given
string.
>>> > 3. Any command line options in effect at the end of the
command line parsing apply to autolinked inputs, e.g. --whole-archive.
>>>
>>> I've not got any experience of autolinking as a user, so
I'm
>>> struggling a bit with this one. I'm guessing that autolinking
is
>>> useful because someone can do the equivalent of #include
<library.h>
>>> and #pragma comment lib "library.so" in the same place
without having
>>> to fight the build system.
>>
>>
>> Right. Consider that many codebases have multiple build configurations
and the linker needs to be given the correct version of a library to use for the
particular build configuration. This is often easier to do using the
preprocessor than in the build system. Also, if a program is dependent on an
external library, autolinking allows the library writer to reorganize how that
library is structured transparently to the users of the library. There are notes
about utility in
https://stackoverflow.com/questions/1685206/pragma-commentlib-xxx-lib-equivalent-under-linux
and
https://stackoverflow.com/questions/3851956/whats-pragma-comment-lib-lib-glut32-lib?noredirect=1&lq=1.
>>
>>>
>>> I'm less convinced about --whole-archive as
>>> I think this tends to be a way of structuring the build and would
be
>>> best made explicit in the build system. Moreover, what if someone
>>> wants to not use --whole-archive, for their autolink, but one
already
>>> exists.
>>
>>
>> Then they can specify --no-whole-archive on the end of the command
line, no?
>>
>>>
>>> This could be quite difficult to check with a large project.
>>> Personally I'd have the user be explicit in the .autolink
whether they
>>> were intending it to be whole-archive or not.
>>
>>
>> I was hoping to avoid this as I want to avoid getting into how to
specify linker specific options in the frontend. If we dislike the idea that the
state of the command line parser at the end of the linker command line affects
the autolinked libraries then I would rather go for a scheme in which the
default state of the command line parser applies when linking the autolinked
libraries; however, that seems harder to implement in LLD and gives the user
less control over autolinking.
>
>
> I think that handling .autolink'ed files in the default state is
simpler, and it doesn't seem too hard to implement.
>
> The other option is to handle autolinked libraries as soon as we find them,
so that if foo.o autolinks libbar, the linker would act as if foo.o in the
command line is followed by -lbar. I'd think that's not too bad or
arguably more straightforward semantics than autolinking everything all at once
at the end.
>
One of the difficulties of having --whole-archive applying to all the
libraries at the end is the case where some libraries need to be
--whole-archive and some do not. If I've understood correctly,
retaining the state at the end of the link line is all or nothing. I'm
torn over whether processing the library as soon as it is found is the
right one or not. Yes it does solve part of the --whole-archive
problem, but actually using it in practice would be difficult. It also
tends to move the autolink libraries ahead of the explicit ones and I
think user's would prefer the other way around.
>>>
>>> > 4. Duplicate autolinked inputs are ignored.
>>>
>>> If we take the issue of --whole-archive off the table does it
matter
>>> that there are duplicate libraries? Unresolved symbols will match
>>> against the first library.
>>
>>
>> It doesn't matter for libraries in LLD; but, it is important for
object files. I think that this mechanism should be usable for object files an
libraries. This is common in ELF linkers - for example the --library command
line option can be used to link object files.
>
>
> Do you actually often link .o file using -l? It seems a bit weird use of
the option. To me, it seems better to limit the ability of autolinking to link
against .so or .a.
>
I think you need the explicit : in the namespec to load an object. I
can't say I've seen it used in anger before in an ELF context, but it
may be more useful in an autolink context to effectively (ab)use the
pre-processor into doing conditional inclusion of object files.
>>> I guess it might make a difference if this
>>> feature is implemented in ld.lld and ld.gold, where you'd have
to wrap
>>> the libraries in a start-group, end-group, but is this likely to
>>> happen?
>>
>>
>> I would like the design to be such that it could be implemented by GNU.
>>
>>>
>>>
>>> > 5. The linker tries to add a library or relocatable object
file from each of the strings in a .autolink section by; first, handling the
string as if it was specified on the commandline; second, by looking for the
string in each of the library search paths in turn; third, by looking for a
lib<string>.a or lib<string>.so (depending on the current mode of
the linker) in each of the library search paths.
>>>
>>> There is some precedent for including files and libraries from
>>> linkerscripts
https://sourceware.org/binutils/docs/ld/File-Commands.html#File-Commands
>>> , these distinguish between "-lfile" and
"file". Would this be a
>>> better fit for a ld.bfd interface compatible linker?
>>>
>>
>> I was hoping to avoid GNUism's and use a "general"
mechanism. MSVC source code compatibility is a usecase.
>>
>>>
>>> > 6. A new command line option --no-llvm-autolink will tell LLD
to ignore the .autolink sections.
>>>
>>> Personally I would have thought --no-llvm-autolink would error if
it
>>> found a .autolink section, on the grounds that I wanted all the
>>> libraries to be defined on the command-line or linker script rather
>>> than hidden in object files. I would have thought ignoring the
>>> autolink sections would in most cases result in undefined symbols.
If
>>> there is a use case for it, perhaps --ignore-llvm-autolink.
>>>
>>
>> The usecase that I had in mind is that you need to override
autolinking. To do so you tell the linker to ignore the embedded autolinking
information and construct an equivalent command line. I think your proposed 
--ignore-llvm-autolink is a better name for this option given the intended
semantics.
>>
>>>
>>> > Rationale for the above points:
>>> >
>>> > 1. Adding the autolinked inputs last makes the process simple
to understand from a developers perspective. All linkers are able to implement
this scheme.
>>> > 2. Error-ing for libraries that are not found seems like
better behavior than failing the link during symbol resolution.
>>> > 3. It seems useful for the user to be able to apply command
line options which will affect all of the autolinked input files. There is a
potential problem of surprise for developers, who might not realize that these
options would apply to the "invisible" autolinked input files;
however, despite the potential for surprise, this is easy for developers to
reason about and gives developers the control that they may require.
>>> > 4. Unlike on the command line it is probably easy to include
the same input file twice via pragmas and might be a pain to fix; think of
Third-party libraries supplied as binaries.
>>> > 5. This algorithm takes into account all of the different ways
that ELF linkers find input files. The different search methods are tried by the
linker in most obvious to least obvious order.
>>> > 6. I considered adding finer grained control over which
.autolink inputs were ignored (e.g. MSVC has /nodefaultlib:<library>);
however, I concluded that this is not necessary: if finer control is required
developers can recreate the same effect autolinking would have had using command
line options.
>>> >
>>> > Thoughts?
>>> >
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > llvm-dev at lists.llvm.org
>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

bd1976 llvm via llvm-dev

2019-Mar-14 18:43 UTC

head link

[llvm-dev] RFC: ELF Autolinking

On Thu, Mar 14, 2019 at 5:58 PM Rui Ueyama <ruiu at google.com> wrote:
> On Thu, Mar 14, 2019 at 9:45 AM bd1976 llvm via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Thu, Mar 14, 2019 at 3:32 PM Peter Smith <peter.smith at
linaro.org>
>> wrote:
>>
>>> Hello,
>>>
>>> I've put some comments on the proposal inline. Having to had to
debug
>>> library selection problems where all the libraries are visible on
the
>>> linker command line, I would prefer if people didn't embed
difficult
>>> to find directives in object files, but I'm guessing in some
languages
>>> this is the natural way of adding libraries.
>>>
>>> On Thu, 14 Mar 2019 at 13:08, bd1976 llvm via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> >
>>> > At Sony we offer autolinking as a feature in our ELF
toolchain. We
>>> would like to see full support for this feature upstream as there
is
>>> anecdotal evidence that it would find use beyond Sony.
>>> >
>>>
>>> I've not got any use of the existing code. Personally I've
not come
>>> across anyone wanting this type of feature, but that is also
anecdotal
>>> on my part.
>>>
>>> >
>>> > For ELF we need limited autolinking support. Specifically, we
only
>>> need support for "comment lib" pragmas (
>>>
https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017)
>>> in C/C++ e.g. #pragma comment(lib, "foo"). My suggestion
that we keep the
>>> implementation as lean as possible.
>>> >
>>> > Principles to guide the implementation:
>>> > - Developers should be able to easily understand autolinking
behavior.
>>> > - Developers should be able to override autolinking from the
linker
>>> command line.
>>> > - Inputs specified via pragmas should be handled in a general
way to
>>> allow the same source code to work in different environments.
>>> >
>>> > I would like to propose that we focus on autolinking
exclusively and
>>> that we divorce the implementation from the idea of "linker
options" which,
>>> by nature, would tie source code to the vagaries of particular
linkers. I
>>> don't see much value in supporting other linker operations so I
suggest
>>> that the binary representation be a mergable string section
(SHF_MERGE,
>>> SHF_STRINGS), called .autolink, with custom type SHT_LLVM_AUTOLINK
>>> (0x6fff4c04), and SHF_EXCLUDE set (to avoid the contents appearing
in the
>>> output). The compiler can form this section by concatenating the
arguments
>>> of the "comment lib" pragmas in the order they are
encountered. Partial
>>> (-r, -Ur) links can be handled by concatenating .autolink sections
with the
>>> normal mergeable string section rules. The current .linker-options
can
>>> remain (or be removed); but, "comment lib" pragmas for
ELF should be
>>> lowered to .autolink not to .linker-options. This makes sense as
there is
>>> no linker option that "comment lib" pragmas map directly
to. As an example,
>>> #pragma comment(lib, "foo") would result in:
>>> >
>>> > .section ".autolink","eMS", at
llvm_autolink,1
>>> >         .asciz "foo"
>>> >
>>> > For LTO, equivalent information to the contents of a the
.autolink
>>> section will be written to the IRSymtab so that it is available to
the
>>> linker for symbol resolution.
>>> >
>>>
>>> I'm not sure I understand the bit about "for symbol
resolution". I
>>> think that what you mean is that you will encode the autolink
section
>>> using symbols instead of as a section, and the linker is expected
to
>>> extract this when it reads the symbol table?
>>>
>>>
>> Whoops... might have used a bit of a colloquialism there; sorry. All I
>> mean is that there will be a method on the IRSymtab that LLD can use to
>> retrieve the same set of strings that would be written into the the
>> .autolink section of the relocatable object files by the backend.
>>
>>
>>> > The linker will process the .autolink strings in the following
way:
>>> >
>>> > 1. Inputs from the .autolink sections of a relocatable object
file are
>>> added when the linker decides to include that file (which could
itself be
>>> in a library) in the link. Autolinked inputs behave as if they were
>>> appended to the command line as a group after all other options. As
a
>>> consequence the set of autolinked libraries are searched last to
resolve
>>> symbols.
>>> > 2. It is an error if a file cannot be found for a given
string.
>>> > 3. Any command line options in effect at the end of the
command line
>>> parsing apply to autolinked inputs, e.g. --whole-archive.
>>>
>>> I've not got any experience of autolinking as a user, so
I'm
>>> struggling a bit with this one. I'm guessing that autolinking
is
>>> useful because someone can do the equivalent of #include
<library.h>
>>> and #pragma comment lib "library.so" in the same place
without having
>>> to fight the build system.
>>
>>
>> Right. Consider that many codebases have multiple build configurations
>> and the linker needs to be given the correct version of a library to
use
>> for the particular build configuration. This is often easier to do
using
>> the preprocessor than in the build system. Also, if a program is
dependent
>> on an external library, autolinking allows the library writer to
reorganize
>> how that library is structured transparently to the users of the
library.
>> There are notes about utility in
>>
https://stackoverflow.com/questions/1685206/pragma-commentlib-xxx-lib-equivalent-under-linux
>> and
>>
https://stackoverflow.com/questions/3851956/whats-pragma-comment-lib-lib-glut32-lib?noredirect=1&lq=1
>> .
>>
>>
>>> I'm less convinced about --whole-archive as
>>> I think this tends to be a way of structuring the build and would
be
>>> best made explicit in the build system. Moreover, what if someone
>>> wants to not use --whole-archive, for their autolink, but one
already
>>> exists.
>>
>>
>> Then they can specify --no-whole-archive on the end of the command
line,
>> no?
>>
>>
>>> This could be quite difficult to check with a large project.
>>> Personally I'd have the user be explicit in the .autolink
whether they
>>> were intending it to be whole-archive or not.
>>>
>>
>> I was hoping to avoid this as I want to avoid getting into how to
specify
>> linker specific options in the frontend. If we dislike the idea that
the
>> state of the command line parser at the end of the linker command line
>> affects the autolinked libraries then I would rather go for a scheme in
>> which the default state of the command line parser applies when linking
the
>> autolinked libraries; however, that seems harder to implement in LLD
and
>> gives the user less control over autolinking.
>>
>
> I think that handling .autolink'ed files in the default state is
simpler,
> and it doesn't seem too hard to implement.
>
Right.. definitely possible to implement. So the trade offs are that it is
possibly confusing if options like --whole-archive start applying to the
"invisible" autolinked inputs. OTOH why not allow command line options
to
affect the autolinked inputs? It gives developers some more control at no
cost (apart form the possible confusion).

>
> The other option is to handle autolinked libraries as soon as we find
> them, so that if foo.o autolinks libbar, the linker would act as if foo.o
> in the command line is followed by -lbar. I'd think that's not too
bad or
> arguably more straightforward semantics than autolinking everything all at
> once at the end.
>
So I played around with this idea a bit. Some background info:

MSVC searches libraries added via "comment lib" pragmas last, after
searching all of the libraries specified on the command line; however,
symbols that are unresolved when bringing in an object file from a library
are searched for in that library first (
https://docs.microsoft.com/en-us/cpp/build/reference/link-input-files?view=vs-2017
).

In the upstream discussion for autolinking, Cary Coutant offered the
following as a good compromise for traditional ELF linkers (
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120382.html.):

"""I think what would work is to insert each requested object or
shared
library into the link order immediately after the object that requests
it, but only if the object hasn't already been inserted and isn't
already listed on the command line (i.e., we won't try to load the
same file twice); and to search each requested archive library
immediately after each object that requests it (of course, because of
how library searching works, we would load a given archive member once
at most). With this method, libm would be searched after both a.o and
b.o, so we'd load any members needed by a.o before b.o, and any
remaining members needed by b.o before c.o."""

The problem with what your suggesting is that with the GNU linkers it is
always possible to define "where" in the command line parsing you are.
However for MSVC or LLD it is not always possible.. think of a object file
in a library that autolinks foo.a that gets pulled into the link (by a
undefined symbol) much later on in the link order. My RFC is careful to try
to set out a scheme that all linkers can implement (as much as is possible).
>
>
>>> > 4. Duplicate autolinked inputs are ignored.
>>>
>>> If we take the issue of --whole-archive off the table does it
matter
>>> that there are duplicate libraries? Unresolved symbols will match
>>> against the first library.
>>
>>
>> It doesn't matter for libraries in LLD; but, it is important for
object
>> files. I think that this mechanism should be usable for object files an
>> libraries. This is common in ELF linkers - for example the --library
>> command line option can be used to link object files.
>>
>>>
> Do you actually often link .o file using -l? It seems a bit weird use of
> the option. To me, it seems better to limit the ability of autolinking to
> link against .so or .a.
>
>I don't personally but it does seem useful to be able to find .o files on
the library search paths.

> I guess it might make a difference if this
>>> feature is implemented in ld.lld and ld.gold, where you'd have
to wrap
>>> the libraries in a start-group, end-group, but is this likely to
>>> happen?
>>>
>>
>> I would like the design to be such that it could be implemented by GNU.
>>
>>
>>>
>>> > 5. The linker tries to add a library or relocatable object
file from
>>> each of the strings in a .autolink section by; first, handling the
string
>>> as if it was specified on the commandline; second, by looking for
the
>>> string in each of the library search paths in turn; third, by
looking for a
>>> lib<string>.a or lib<string>.so (depending on the
current mode of the
>>> linker) in each of the library search paths.
>>>
>>> There is some precedent for including files and libraries from
>>> linkerscripts
>>>
https://sourceware.org/binutils/docs/ld/File-Commands.html#File-Commands
>>> , these distinguish between "-lfile" and
"file". Would this be a
>>> better fit for a ld.bfd interface compatible linker?
>>>
>>>
>> I was hoping to avoid GNUism's and use a "general"
mechanism. MSVC source
>> code compatibility is a usecase.
>>
>>
>>> > 6. A new command line option --no-llvm-autolink will tell LLD
to
>>> ignore the .autolink sections.
>>>
>>> Personally I would have thought --no-llvm-autolink would error if
it
>>> found a .autolink section, on the grounds that I wanted all the
>>> libraries to be defined on the command-line or linker script rather
>>> than hidden in object files. I would have thought ignoring the
>>> autolink sections would in most cases result in undefined symbols.
If
>>> there is a use case for it, perhaps --ignore-llvm-autolink.
>>>
>>>
>> The usecase that I had in mind is that you need to override
autolinking.
>> To do so you tell the linker to ignore the embedded autolinking
information
>> and construct an equivalent command line. I think your proposed
>> --ignore-llvm-autolink is a better name for this option given the
intended
>> semantics.
>>
>>
>>> > Rationale for the above points:
>>> >
>>> > 1. Adding the autolinked inputs last makes the process simple
to
>>> understand from a developers perspective. All linkers are able to
implement
>>> this scheme.
>>> > 2. Error-ing for libraries that are not found seems like
better
>>> behavior than failing the link during symbol resolution.
>>> > 3. It seems useful for the user to be able to apply command
line
>>> options which will affect all of the autolinked input files. There
is a
>>> potential problem of surprise for developers, who might not realize
that
>>> these options would apply to the "invisible" autolinked
input files;
>>> however, despite the potential for surprise, this is easy for
developers to
>>> reason about and gives developers the control that they may
require.
>>> > 4. Unlike on the command line it is probably easy to include
the same
>>> input file twice via pragmas and might be a pain to fix; think of
>>> Third-party libraries supplied as binaries.
>>> > 5. This algorithm takes into account all of the different ways
that
>>> ELF linkers find input files. The different search methods are
tried by the
>>> linker in most obvious to least obvious order.
>>> > 6. I considered adding finer grained control over which
.autolink
>>> inputs were ignored (e.g. MSVC has /nodefaultlib:<library>);
however, I
>>> concluded that this is not necessary: if finer control is required
>>> developers can recreate the same effect autolinking would have had
using
>>> command line options.
>>> >
>>> > Thoughts?
>>> >
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > llvm-dev at lists.llvm.org
>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190314/7abc364a/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Mar 2019 - RFC: ELF Autolinking

[llvm-dev] RFC: ELF Autolinking

[llvm-dev] RFC: ELF Autolinking

[llvm-dev] RFC: ELF Autolinking

[llvm-dev] RFC: ELF Autolinking

Possibly Parallel Threads