thr3ads.net - llvm dev - [llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Petr Hosek via llvm-dev

2018-Sep-27 23:27 UTC

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

On Thu, Sep 27, 2018 at 3:12 PM Rui Ueyama via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Thu, Sep 27, 2018 at 2:42 PM Armando Montanez via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Since the goal is to start llvm-tapi more or less from scratch, I feel
>> the best approach initially is to focus on the structure as a key
>> point of feedback in initial reviews. Once the foundations are set,
>> integrating Mach-O TAPI in parallel with the ELF implementation should
>> be relatively straightforward. The features outside of stubbing
aren't
>> as appealing for ELF, so I probably won't be working on extending
that
>> functionality. With that being said, the overall design goal is
>> generalization/abstraction where possible to welcome feature parity in
>> case it is eventually desired. I'm sure we'll run into things
that
>> belong in the tool but end up being uniquely specialized, and it will
>> probably be best to address them on a case-by-case basis.
>>
>
> I'm not very sure what you meant by generalizing it, but given that
>
> 1) implementing a text-based ELF stub format is not appealing, and we
> probably won't implement, and
>
You could use readelf/objdump, but these tools weren't designed with that
use-case in mind and their output isn't adequate for many of the common use
cases we're considering: it's not well-specified, it's not designed
to be
machine (and often human) readable or easily diffable. All the ad-hoc
solutions out there, many of which were pointed out in Armando's proposal,
demonstrate the need for text-based representation. While, I understand
your concerns, I think we should focus on the tool and library itself and
leave the discussion of direct linker support for later.

We're also not firmly set on YAML. We've chosen YAML because it's
already
used by Apple's Mach-O implementation, but we could consider a different
format and we're open to suggestions. However, given our requirements
(machine and human readable, easily diffable) I'm not sure if we're
going
to come up with something that's significantly different from YAML.
Furthermore, YAML has the advantage of already being supported in variety
of languages.

> 2) COFF already has its own (binary) stub format,
>
Do you have a reference that describes the format and the tooling?

> I don't see a point of generalizing it. Isn't it just a Mach-O only
thing?
> If (1) is not true, then maybe we should generalize it, so I think you need
> to show evidences that we need a text-based ELF stub format.
>
> On Wed, Sep 26, 2018 at 2:42 PM Steven Wu <stevenwu at apple.com>
wrote:
>> >
>> > Hi Armando
>> >
>> > Thanks for the detailed RFC and all the background research. I
think
>> the concept is good and I will be happy to work with you to integrate
the
>> ELF implementation with Apple's MachO implementation and contribute
it
>> upstream. Do you have any proposal on how to integrate with Apple's
tapi
>> and how should we collaborate?
>> >
>> > Also, Apple's tapi does more than just stubbing. Are you
interested to
>> add ELF support for other features as well? (I guess it should not be
too
>> hard to do that).
>> >
>> > Thanks
>> >
>> > Steven
>> >
>> > > On Sep 26, 2018, at 8:29 AM, Armando Montanez via llvm-dev
<
>> llvm-dev at lists.llvm.org> wrote:
>> > >
>> > > Hello all,
>> > >
>> > > LLVM-TAPI seeks to decouple the necessary link-time
information for a
>> > > dynamic shared object from the implementation of the runtime
object.
>> > > This process will be referred to as dynamic shared object
(DSO)
>> > > stubbing throughout this proposal. A number of projects have
>> > > implemented their own versions of shared object stubbing for
a variety
>> > > of reasons related to improving the overall linking
experience. This
>> > > functionality is absent from LLVM despite how close the
practice is to
>> > > LLVM’s domain. The goal of this project would be to produce a
library
>> > > for LLVM that not only provides a means for DSO stubbing, but
also
>> > > gives meaningful insight into the contents of these stubs and
how they
>> > > change. I’ve collected a few example instances of object
stubbing as
>> > > part of larger tools and the key benefits that resulted from
them:
>> > >
>> > > - Apple’s TAPI [1]: Stubbing used to reduce SDK size and
improve
>> build times.
>> > > - Oracle’s Solaris OS linker [2]: Stubbing used to improve
build
>> > > times, and improve robustness of build system (against
dependency
>> > > cycles and race conditions).
>> > > - Google’s Bazel [3]: Stubbing used to improve build times.
>> > > - Google’s Fuchsia [4] [5]: Stubbing used to improve build
times.
>> > > - Android NDK: Stubbing used to reduce size of native sdk,
control
>> > > exported symbols, and improve build times.
>> > >
>> > > Somewhat tangentially, a tool called libabigail [6] provides
utilities
>> > > for tracking changes relevant to ELF files in a meaningful
way. One of
>> > > libabigai’s tools provides very detailed textual XML
representations
>> > > of objects, which is especially useful in the absence of a
preexisting
>> > > textual representation of shared objects’ exposed interfaces.
Glibc
>> > > [7] and libc++ [8] have made an effort to address this in
their own
>> > > ways by using scripts to produce textual representations of
object
>> > > interfaces. This functionality makes it significantly easier
to
>> > > analyze and control symbol visibility, though the existing
solutions
>> > > are quite bespoke. Controlling these symbols can have an
implicit
>> > > benefit of reducing binary size by pruning visible symbols,
but the
>> > > more critical feature is being able to easily view and edit
the
>> > > exposed symbols in the first place. Using human-readable
stubs
>> > > addresses the issues of DSO analysis and control without
requiring
>> > > highly specialized tools. This does not strive to replace
tools
>> > > altogether; it just makes small tasks significantly more
approachable.
>> > >
>> > > llvm-tapi would strive to be an intersection between a means
to
>> > > produce and link against stubs, and providing tools that
offer more
>> > > control and insight into the public interfaces of DSOs. More
>> > > fundamentally, llvm-tapi would introduce a library to
generate and
>> > > ingest human-readable stubs from DSOs to address these issues
directly
>> > > in LLVM. Overall, this idea is most similar to the vein of
Apple’s
>> > > TAPI, as the original TAPI also uses human-readable stubs.
>> > >
>> > > In general, llvm-tapi should:
>> > >
>> > > 1. Produce human-readable text files from dynamic shared
objects that
>> > > are concise, readable, and contain everything required for
linking
>> > > that can’t be implicitly derived.
>> > > 2. Produce linkable files from said human readable text
files.
>> > > 3. Provide tools to track and control the exposed interfaces
of
>> object files.
>> > > 4. Integrate well with LLVM’s existing tools.
>> > > 5. Strive to enable integration of the original TAPI code for
Mach-O
>> support.
>> > >
>> > > There are a number of key benefits to using stubs and
text-based
>> > > application binary interfaces such as:
>> > > - Reducing the size of dynamic shared objects used
exclusively for
>> linking.
>> > > - The ability to avoid re-linking an object when its
dependencies’
>> > > exposed interfaces do not change but their implementation
does (which
>> > > happens frequently).
>> > > - Simplicity of viewing a diff for a changed DSO interface.
>> > > A large number of other use cases exist; this would open up
the floor
>> > > for a variety of other tools and future work as the concept
is rather
>> > > generic.
>> > >
>> > > The proposed YAML format would be analogous to Apple’s .tbd
format but
>> > > differ in a few ways to support ELF object types. An example
would be
>> > > as follows:
>> > >
>> > > --- !tapi-tbe-v1
>> > > soname: someobj.so
>> > > architecture: aarch64
>> > > symbols:
>> > > - name: fish
>> > >   type: object
>> > >   size: 48
>> > > - name: foobar
>> > >   type: function
>> > >   warning-text: “deprecated in SOMEOBJ_1.3”
>> > > - name: printf
>> > >   type: function
>> > > - name: rndfunc
>> > >   type: function
>> > >   undefined: true
>> > > ...
>> > >
>> > > (Note that this doesn’t account for version sets, but such
>> > > functionality can be included in a later version.)
>> > >
>> > > Most of the fields are self-explanatory, with size not being
relevant
>> > > to function symbols, and warning text being purely optional.
One
>> > > reason this departs from .tbd format is to make diffs much
easier:
>> > > sorting symbols alphabetically on individual lines makes it
much more
>> > > obvious which symbols are added, removed, or modified.
Despite the
>> > > differences, the desire is for llvm-tapi to be structured
such that
>> > > integrating Apple’s Mach-O TAPI will be plausible and
welcomed. Prior
>> > > discussion [9] indicated interest in integrating Apple TAPI
into LLVM,
>> > > so I’d definitely like to leave that door open and encourage
that in
>> > > the future.
>> > >
>> > > I feel the best place to start this is as a library to best
facilitate
>> > > integration into other areas of LLVM, later wrapping it in a
>> > > standalone tool and eventually considering direct integration
into
>> > > LLD. The tool will initially support basic generation of .tbe
and stub
>> > > files from .tbe or ELF. This should give enough functionality
for
>> > > manually checking shared object interface diffs, as well as
having
>> > > access to linkable stubs. The goal is for the tool to
eventually
>> > > provide additional functionality such as compatibility
checking, but
>> > > that’s a ways into the future.shared
>> > >
>> > > There’s multiple options for integrating llvm-tapi to work
with LLD;
>> > > LLD could directly use llvm-tapi to produce and ingest .tbe
files
>> > > directly, or llvm-tapi could be used to produce stubs that
LLD can be
>> > > taught to use. From a technical standpoint, these are not
mutually
>> > > exclusive. This step is a ways down the road, but is
definitely a
>> > > high-priority goal.
>> > >
>> > > I’m interested to hear your thoughts and feedback on this.
>> > >
>> > > Best,
>> > > Armando
>> > >
>> > >
>> > > [1] https://github.com/ributzka/tapi
>> > > [2]
>> https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html
>> > > [3]
>>
https://docs.bazel.build/versions/master/user-manual.html#flag--interface_shared_objects
>> > > [4]
>> https://fuchsia.googlesource.com/zircon/+/master/scripts/shlib-symbols
>> > > [5]
>> https://fuchsia.googlesource.com/zircon/+/master/scripts/dso-abi.h
>> > > [6] https://sourceware.org/libabigail/
>> > > [7]
>>
https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/abilist.awk;h=bad7c3807e478e50e63c3834aa8969214bdd6f63;hb=HEAD
>> > > [8]
>> https://github.com/llvm-mirror/libcxx/blob/master/utils/sym_extract.py
>> > > [9]
>> http://lists.llvm.org/pipermail/cfe-dev/2018-April/thread.html#57576
>> > > _______________________________________________
>> > > LLVM Developers mailing list
>> > > llvm-dev at lists.llvm.org
>> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180927/8d56f1a0/attachment-0001.html>

Rui Ueyama via llvm-dev

2018-Sep-27 23:36 UTC

head link

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

On Thu, Sep 27, 2018 at 4:27 PM Petr Hosek <phosek at chromium.org> wrote:
> On Thu, Sep 27, 2018 at 3:12 PM Rui Ueyama via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Thu, Sep 27, 2018 at 2:42 PM Armando Montanez via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Since the goal is to start llvm-tapi more or less from scratch, I
feel
>>> the best approach initially is to focus on the structure as a key
>>> point of feedback in initial reviews. Once the foundations are set,
>>> integrating Mach-O TAPI in parallel with the ELF implementation
should
>>> be relatively straightforward. The features outside of stubbing
aren't
>>> as appealing for ELF, so I probably won't be working on
extending that
>>> functionality. With that being said, the overall design goal is
>>> generalization/abstraction where possible to welcome feature parity
in
>>> case it is eventually desired. I'm sure we'll run into
things that
>>> belong in the tool but end up being uniquely specialized, and it
will
>>> probably be best to address them on a case-by-case basis.
>>>
>>
>> I'm not very sure what you meant by generalizing it, but given that
>>
>> 1) implementing a text-based ELF stub format is not appealing, and we
>> probably won't implement, and
>>
>
> You could use readelf/objdump, but these tools weren't designed with
that
> use-case in mind and their output isn't adequate for many of the common
use
> cases we're considering: it's not well-specified, it's not
designed to be
> machine (and often human) readable or easily diffable. All the ad-hoc
> solutions out there, many of which were pointed out in Armando's
proposal,
> demonstrate the need for text-based representation. While, I understand
> your concerns, I think we should focus on the tool and library itself and
> leave the discussion of direct linker support for later.
>
I don't have an objection to creating a tool to dump a DSO contents in a
machine-readable format, though looks like its goal overlaps with existing
obj2yaml tool, as obj2yaml is intended to convert a native binary object
file to a YAML text file.

We're also not firmly set on YAML. We've chosen YAML because it's
already> used by Apple's Mach-O implementation, but we could consider a
different
> format and we're open to suggestions. However, given our requirements
> (machine and human readable, easily diffable) I'm not sure if we're
going
> to come up with something that's significantly different from YAML.
> Furthermore, YAML has the advantage of already being supported in variety
> of languages.
>
I guess that YAML is fine. LLVM's YAML reader is kind of slow, but
that's
an implementation matter.

>
>> 2) COFF already has its own (binary) stub format,
>>
>
> Do you have a reference that describes the format and the tooling?
>
Maybe this one?
https://docs.microsoft.com/en-us/windows/desktop/dlls/dynamic-link-library-creation

When you create a DLL on Windows, the linker produces two files. One is a
.dll file and the other is a .lib file. The DLL file contains actual code
and dynamically linked to an executable. The LIB file is an archive file
that contains fake object files for each exported symbols, each of which
explains an exported symbol. When you link your program against a DLL, you
don't directly link against a DLL. Instead, you need to pass a .lib file
that corresponds to a desired .dll file.

That way, Windows SDKs don't have to include actual DLL files if you just
want to allow linking against DLLs. (Of course you need actual DLLs to run
your program though.)

>
>> I don't see a point of generalizing it. Isn't it just a Mach-O
only
>> thing? If (1) is not true, then maybe we should generalize it, so I
think
>> you need to show evidences that we need a text-based ELF stub format.
>>
>> On Wed, Sep 26, 2018 at 2:42 PM Steven Wu <stevenwu at apple.com>
wrote:
>>> >
>>> > Hi Armando
>>> >
>>> > Thanks for the detailed RFC and all the background research. I
think
>>> the concept is good and I will be happy to work with you to
integrate the
>>> ELF implementation with Apple's MachO implementation and
contribute it
>>> upstream. Do you have any proposal on how to integrate with
Apple's tapi
>>> and how should we collaborate?
>>> >
>>> > Also, Apple's tapi does more than just stubbing. Are you
interested to
>>> add ELF support for other features as well? (I guess it should not
be too
>>> hard to do that).
>>> >
>>> > Thanks
>>> >
>>> > Steven
>>> >
>>> > > On Sep 26, 2018, at 8:29 AM, Armando Montanez via
llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> > >
>>> > > Hello all,
>>> > >
>>> > > LLVM-TAPI seeks to decouple the necessary link-time
information for a
>>> > > dynamic shared object from the implementation of the
runtime object.
>>> > > This process will be referred to as dynamic shared object
(DSO)
>>> > > stubbing throughout this proposal. A number of projects
have
>>> > > implemented their own versions of shared object stubbing
for a
>>> variety
>>> > > of reasons related to improving the overall linking
experience. This
>>> > > functionality is absent from LLVM despite how close the
practice is
>>> to
>>> > > LLVM’s domain. The goal of this project would be to
produce a library
>>> > > for LLVM that not only provides a means for DSO stubbing,
but also
>>> > > gives meaningful insight into the contents of these stubs
and how
>>> they
>>> > > change. I’ve collected a few example instances of object
stubbing as
>>> > > part of larger tools and the key benefits that resulted
from them:
>>> > >
>>> > > - Apple’s TAPI [1]: Stubbing used to reduce SDK size and
improve
>>> build times.
>>> > > - Oracle’s Solaris OS linker [2]: Stubbing used to
improve build
>>> > > times, and improve robustness of build system (against
dependency
>>> > > cycles and race conditions).
>>> > > - Google’s Bazel [3]: Stubbing used to improve build
times.
>>> > > - Google’s Fuchsia [4] [5]: Stubbing used to improve
build times.
>>> > > - Android NDK: Stubbing used to reduce size of native
sdk, control
>>> > > exported symbols, and improve build times.
>>> > >
>>> > > Somewhat tangentially, a tool called libabigail [6]
provides
>>> utilities
>>> > > for tracking changes relevant to ELF files in a
meaningful way. One
>>> of
>>> > > libabigai’s tools provides very detailed textual XML
representations
>>> > > of objects, which is especially useful in the absence of
a
>>> preexisting
>>> > > textual representation of shared objects’ exposed
interfaces. Glibc
>>> > > [7] and libc++ [8] have made an effort to address this in
their own
>>> > > ways by using scripts to produce textual representations
of object
>>> > > interfaces. This functionality makes it significantly
easier to
>>> > > analyze and control symbol visibility, though the
existing solutions
>>> > > are quite bespoke. Controlling these symbols can have an
implicit
>>> > > benefit of reducing binary size by pruning visible
symbols, but the
>>> > > more critical feature is being able to easily view and
edit the
>>> > > exposed symbols in the first place. Using human-readable
stubs
>>> > > addresses the issues of DSO analysis and control without
requiring
>>> > > highly specialized tools. This does not strive to replace
tools
>>> > > altogether; it just makes small tasks significantly more
>>> approachable.
>>> > >
>>> > > llvm-tapi would strive to be an intersection between a
means to
>>> > > produce and link against stubs, and providing tools that
offer more
>>> > > control and insight into the public interfaces of DSOs.
More
>>> > > fundamentally, llvm-tapi would introduce a library to
generate and
>>> > > ingest human-readable stubs from DSOs to address these
issues
>>> directly
>>> > > in LLVM. Overall, this idea is most similar to the vein
of Apple’s
>>> > > TAPI, as the original TAPI also uses human-readable
stubs.
>>> > >
>>> > > In general, llvm-tapi should:
>>> > >
>>> > > 1. Produce human-readable text files from dynamic shared
objects that
>>> > > are concise, readable, and contain everything required
for linking
>>> > > that can’t be implicitly derived.
>>> > > 2. Produce linkable files from said human readable text
files.
>>> > > 3. Provide tools to track and control the exposed
interfaces of
>>> object files.
>>> > > 4. Integrate well with LLVM’s existing tools.
>>> > > 5. Strive to enable integration of the original TAPI code
for Mach-O
>>> support.
>>> > >
>>> > > There are a number of key benefits to using stubs and
text-based
>>> > > application binary interfaces such as:
>>> > > - Reducing the size of dynamic shared objects used
exclusively for
>>> linking.
>>> > > - The ability to avoid re-linking an object when its
dependencies’
>>> > > exposed interfaces do not change but their implementation
does (which
>>> > > happens frequently).
>>> > > - Simplicity of viewing a diff for a changed DSO
interface.
>>> > > A large number of other use cases exist; this would open
up the floor
>>> > > for a variety of other tools and future work as the
concept is rather
>>> > > generic.
>>> > >
>>> > > The proposed YAML format would be analogous to Apple’s
.tbd format
>>> but
>>> > > differ in a few ways to support ELF object types. An
example would be
>>> > > as follows:
>>> > >
>>> > > --- !tapi-tbe-v1
>>> > > soname: someobj.so
>>> > > architecture: aarch64
>>> > > symbols:
>>> > > - name: fish
>>> > >   type: object
>>> > >   size: 48
>>> > > - name: foobar
>>> > >   type: function
>>> > >   warning-text: “deprecated in SOMEOBJ_1.3”
>>> > > - name: printf
>>> > >   type: function
>>> > > - name: rndfunc
>>> > >   type: function
>>> > >   undefined: true
>>> > > ...
>>> > >
>>> > > (Note that this doesn’t account for version sets, but
such
>>> > > functionality can be included in a later version.)
>>> > >
>>> > > Most of the fields are self-explanatory, with size not
being relevant
>>> > > to function symbols, and warning text being purely
optional. One
>>> > > reason this departs from .tbd format is to make diffs
much easier:
>>> > > sorting symbols alphabetically on individual lines makes
it much more
>>> > > obvious which symbols are added, removed, or modified.
Despite the
>>> > > differences, the desire is for llvm-tapi to be structured
such that
>>> > > integrating Apple’s Mach-O TAPI will be plausible and
welcomed. Prior
>>> > > discussion [9] indicated interest in integrating Apple
TAPI into
>>> LLVM,
>>> > > so I’d definitely like to leave that door open and
encourage that in
>>> > > the future.
>>> > >
>>> > > I feel the best place to start this is as a library to
best
>>> facilitate
>>> > > integration into other areas of LLVM, later wrapping it
in a
>>> > > standalone tool and eventually considering direct
integration into
>>> > > LLD. The tool will initially support basic generation of
.tbe and
>>> stub
>>> > > files from .tbe or ELF. This should give enough
functionality for
>>> > > manually checking shared object interface diffs, as well
as having
>>> > > access to linkable stubs. The goal is for the tool to
eventually
>>> > > provide additional functionality such as compatibility
checking, but
>>> > > that’s a ways into the future.shared
>>> > >
>>> > > There’s multiple options for integrating llvm-tapi to
work with LLD;
>>> > > LLD could directly use llvm-tapi to produce and ingest
.tbe files
>>> > > directly, or llvm-tapi could be used to produce stubs
that LLD can be
>>> > > taught to use. From a technical standpoint, these are not
mutually
>>> > > exclusive. This step is a ways down the road, but is
definitely a
>>> > > high-priority goal.
>>> > >
>>> > > I’m interested to hear your thoughts and feedback on
this.
>>> > >
>>> > > Best,
>>> > > Armando
>>> > >
>>> > >
>>> > > [1] https://github.com/ributzka/tapi
>>> > > [2]
>>> https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html
>>> > > [3]
>>>
https://docs.bazel.build/versions/master/user-manual.html#flag--interface_shared_objects
>>> > > [4]
>>>
https://fuchsia.googlesource.com/zircon/+/master/scripts/shlib-symbols
>>> > > [5]
>>> https://fuchsia.googlesource.com/zircon/+/master/scripts/dso-abi.h
>>> > > [6] https://sourceware.org/libabigail/
>>> > > [7]
>>>
https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/abilist.awk;h=bad7c3807e478e50e63c3834aa8969214bdd6f63;hb=HEAD
>>> > > [8]
>>>
https://github.com/llvm-mirror/libcxx/blob/master/utils/sym_extract.py
>>> > > [9]
>>>
http://lists.llvm.org/pipermail/cfe-dev/2018-April/thread.html#57576
>>> > > _______________________________________________
>>> > > LLVM Developers mailing list
>>> > > llvm-dev at lists.llvm.org
>>> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180927/9745035e/attachment.html>

Jake Ehrlich via llvm-dev

2018-Sep-28 00:56 UTC

head link

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

This isn't the same as obj2yaml. It would only contain information relavant
to linking. obj2yaml attempts to be a full textual representation. Also
calling the output of obj2yaml machine readable is kind of dubious since it
has a reasonably complex output format and is *not* an inverse of yaml2obj
as the name might suggest. No inverse of it exists as far as I am aware.
obj2yaml is better for testing than reviewing the public interface of a DSO.

On Thu, Sep 27, 2018, 4:37 PM Rui Ueyama via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Thu, Sep 27, 2018 at 4:27 PM Petr Hosek <phosek at chromium.org>
wrote:
>
>> On Thu, Sep 27, 2018 at 3:12 PM Rui Ueyama via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> On Thu, Sep 27, 2018 at 2:42 PM Armando Montanez via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Since the goal is to start llvm-tapi more or less from scratch,
I feel
>>>> the best approach initially is to focus on the structure as a
key
>>>> point of feedback in initial reviews. Once the foundations are
set,
>>>> integrating Mach-O TAPI in parallel with the ELF implementation
should
>>>> be relatively straightforward. The features outside of stubbing
aren't
>>>> as appealing for ELF, so I probably won't be working on
extending that
>>>> functionality. With that being said, the overall design goal is
>>>> generalization/abstraction where possible to welcome feature
parity in
>>>> case it is eventually desired. I'm sure we'll run into
things that
>>>> belong in the tool but end up being uniquely specialized, and
it will
>>>> probably be best to address them on a case-by-case basis.
>>>>
>>>
>>> I'm not very sure what you meant by generalizing it, but given
that
>>>
>>> 1) implementing a text-based ELF stub format is not appealing, and
we
>>> probably won't implement, and
>>>
>>
>> You could use readelf/objdump, but these tools weren't designed
with that
>> use-case in mind and their output isn't adequate for many of the
common use
>> cases we're considering: it's not well-specified, it's not
designed to be
>> machine (and often human) readable or easily diffable. All the ad-hoc
>> solutions out there, many of which were pointed out in Armando's
proposal,
>> demonstrate the need for text-based representation. While, I understand
>> your concerns, I think we should focus on the tool and library itself
and
>> leave the discussion of direct linker support for later.
>>
>
> I don't have an objection to creating a tool to dump a DSO contents in
a
> machine-readable format, though looks like its goal overlaps with existing
> obj2yaml tool, as obj2yaml is intended to convert a native binary object
> file to a YAML text file.
>
> We're also not firmly set on YAML. We've chosen YAML because
it's already
>> used by Apple's Mach-O implementation, but we could consider a
different
>> format and we're open to suggestions. However, given our
requirements
>> (machine and human readable, easily diffable) I'm not sure if
we're going
>> to come up with something that's significantly different from YAML.
>> Furthermore, YAML has the advantage of already being supported in
variety
>> of languages.
>>
>
> I guess that YAML is fine. LLVM's YAML reader is kind of slow, but
that's
> an implementation matter.
>
>
>>
>>> 2) COFF already has its own (binary) stub format,
>>>
>>
>> Do you have a reference that describes the format and the tooling?
>>
>
> Maybe this one?
>
https://docs.microsoft.com/en-us/windows/desktop/dlls/dynamic-link-library-creation
>
> When you create a DLL on Windows, the linker produces two files. One is a
> .dll file and the other is a .lib file. The DLL file contains actual code
> and dynamically linked to an executable. The LIB file is an archive file
> that contains fake object files for each exported symbols, each of which
> explains an exported symbol. When you link your program against a DLL, you
> don't directly link against a DLL. Instead, you need to pass a .lib
file
> that corresponds to a desired .dll file.
>
> That way, Windows SDKs don't have to include actual DLL files if you
just
> want to allow linking against DLLs. (Of course you need actual DLLs to run
> your program though.)
>
>
>>
>>> I don't see a point of generalizing it. Isn't it just a
Mach-O only
>>> thing? If (1) is not true, then maybe we should generalize it, so I
think
>>> you need to show evidences that we need a text-based ELF stub
format.
>>>
>>> On Wed, Sep 26, 2018 at 2:42 PM Steven Wu <stevenwu at
apple.com> wrote:
>>>> >
>>>> > Hi Armando
>>>> >
>>>> > Thanks for the detailed RFC and all the background
research. I think
>>>> the concept is good and I will be happy to work with you to
integrate the
>>>> ELF implementation with Apple's MachO implementation and
contribute it
>>>> upstream. Do you have any proposal on how to integrate with
Apple's tapi
>>>> and how should we collaborate?
>>>> >
>>>> > Also, Apple's tapi does more than just stubbing. Are
you interested
>>>> to add ELF support for other features as well? (I guess it
should not be
>>>> too hard to do that).
>>>> >
>>>> > Thanks
>>>> >
>>>> > Steven
>>>> >
>>>> > > On Sep 26, 2018, at 8:29 AM, Armando Montanez via
llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>> > >
>>>> > > Hello all,
>>>> > >
>>>> > > LLVM-TAPI seeks to decouple the necessary link-time
information for
>>>> a
>>>> > > dynamic shared object from the implementation of the
runtime object.
>>>> > > This process will be referred to as dynamic shared
object (DSO)
>>>> > > stubbing throughout this proposal. A number of
projects have
>>>> > > implemented their own versions of shared object
stubbing for a
>>>> variety
>>>> > > of reasons related to improving the overall linking
experience. This
>>>> > > functionality is absent from LLVM despite how close
the practice is
>>>> to
>>>> > > LLVM’s domain. The goal of this project would be to
produce a
>>>> library
>>>> > > for LLVM that not only provides a means for DSO
stubbing, but also
>>>> > > gives meaningful insight into the contents of these
stubs and how
>>>> they
>>>> > > change. I’ve collected a few example instances of
object stubbing as
>>>> > > part of larger tools and the key benefits that
resulted from them:
>>>> > >
>>>> > > - Apple’s TAPI [1]: Stubbing used to reduce SDK size
and improve
>>>> build times.
>>>> > > - Oracle’s Solaris OS linker [2]: Stubbing used to
improve build
>>>> > > times, and improve robustness of build system
(against dependency
>>>> > > cycles and race conditions).
>>>> > > - Google’s Bazel [3]: Stubbing used to improve build
times.
>>>> > > - Google’s Fuchsia [4] [5]: Stubbing used to improve
build times.
>>>> > > - Android NDK: Stubbing used to reduce size of native
sdk, control
>>>> > > exported symbols, and improve build times.
>>>> > >
>>>> > > Somewhat tangentially, a tool called libabigail [6]
provides
>>>> utilities
>>>> > > for tracking changes relevant to ELF files in a
meaningful way. One
>>>> of
>>>> > > libabigai’s tools provides very detailed textual XML
representations
>>>> > > of objects, which is especially useful in the absence
of a
>>>> preexisting
>>>> > > textual representation of shared objects’ exposed
interfaces. Glibc
>>>> > > [7] and libc++ [8] have made an effort to address
this in their own
>>>> > > ways by using scripts to produce textual
representations of object
>>>> > > interfaces. This functionality makes it significantly
easier to
>>>> > > analyze and control symbol visibility, though the
existing solutions
>>>> > > are quite bespoke. Controlling these symbols can have
an implicit
>>>> > > benefit of reducing binary size by pruning visible
symbols, but the
>>>> > > more critical feature is being able to easily view
and edit the
>>>> > > exposed symbols in the first place. Using
human-readable stubs
>>>> > > addresses the issues of DSO analysis and control
without requiring
>>>> > > highly specialized tools. This does not strive to
replace tools
>>>> > > altogether; it just makes small tasks significantly
more
>>>> approachable.
>>>> > >
>>>> > > llvm-tapi would strive to be an intersection between
a means to
>>>> > > produce and link against stubs, and providing tools
that offer more
>>>> > > control and insight into the public interfaces of
DSOs. More
>>>> > > fundamentally, llvm-tapi would introduce a library to
generate and
>>>> > > ingest human-readable stubs from DSOs to address
these issues
>>>> directly
>>>> > > in LLVM. Overall, this idea is most similar to the
vein of Apple’s
>>>> > > TAPI, as the original TAPI also uses human-readable
stubs.
>>>> > >
>>>> > > In general, llvm-tapi should:
>>>> > >
>>>> > > 1. Produce human-readable text files from dynamic
shared objects
>>>> that
>>>> > > are concise, readable, and contain everything
required for linking
>>>> > > that can’t be implicitly derived.
>>>> > > 2. Produce linkable files from said human readable
text files.
>>>> > > 3. Provide tools to track and control the exposed
interfaces of
>>>> object files.
>>>> > > 4. Integrate well with LLVM’s existing tools.
>>>> > > 5. Strive to enable integration of the original TAPI
code for
>>>> Mach-O support.
>>>> > >
>>>> > > There are a number of key benefits to using stubs and
text-based
>>>> > > application binary interfaces such as:
>>>> > > - Reducing the size of dynamic shared objects used
exclusively for
>>>> linking.
>>>> > > - The ability to avoid re-linking an object when its
dependencies’
>>>> > > exposed interfaces do not change but their
implementation does
>>>> (which
>>>> > > happens frequently).
>>>> > > - Simplicity of viewing a diff for a changed DSO
interface.
>>>> > > A large number of other use cases exist; this would
open up the
>>>> floor
>>>> > > for a variety of other tools and future work as the
concept is
>>>> rather
>>>> > > generic.
>>>> > >
>>>> > > The proposed YAML format would be analogous to
Apple’s .tbd format
>>>> but
>>>> > > differ in a few ways to support ELF object types. An
example would
>>>> be
>>>> > > as follows:
>>>> > >
>>>> > > --- !tapi-tbe-v1
>>>> > > soname: someobj.so
>>>> > > architecture: aarch64
>>>> > > symbols:
>>>> > > - name: fish
>>>> > >   type: object
>>>> > >   size: 48
>>>> > > - name: foobar
>>>> > >   type: function
>>>> > >   warning-text: “deprecated in SOMEOBJ_1.3”
>>>> > > - name: printf
>>>> > >   type: function
>>>> > > - name: rndfunc
>>>> > >   type: function
>>>> > >   undefined: true
>>>> > > ...
>>>> > >
>>>> > > (Note that this doesn’t account for version sets, but
such
>>>> > > functionality can be included in a later version.)
>>>> > >
>>>> > > Most of the fields are self-explanatory, with size
not being
>>>> relevant
>>>> > > to function symbols, and warning text being purely
optional. One
>>>> > > reason this departs from .tbd format is to make diffs
much easier:
>>>> > > sorting symbols alphabetically on individual lines
makes it much
>>>> more
>>>> > > obvious which symbols are added, removed, or
modified. Despite the
>>>> > > differences, the desire is for llvm-tapi to be
structured such that
>>>> > > integrating Apple’s Mach-O TAPI will be plausible and
welcomed.
>>>> Prior
>>>> > > discussion [9] indicated interest in integrating
Apple TAPI into
>>>> LLVM,
>>>> > > so I’d definitely like to leave that door open and
encourage that in
>>>> > > the future.
>>>> > >
>>>> > > I feel the best place to start this is as a library
to best
>>>> facilitate
>>>> > > integration into other areas of LLVM, later wrapping
it in a
>>>> > > standalone tool and eventually considering direct
integration into
>>>> > > LLD. The tool will initially support basic generation
of .tbe and
>>>> stub
>>>> > > files from .tbe or ELF. This should give enough
functionality for
>>>> > > manually checking shared object interface diffs, as
well as having
>>>> > > access to linkable stubs. The goal is for the tool to
eventually
>>>> > > provide additional functionality such as
compatibility checking, but
>>>> > > that’s a ways into the future.shared
>>>> > >
>>>> > > There’s multiple options for integrating llvm-tapi to
work with LLD;
>>>> > > LLD could directly use llvm-tapi to produce and
ingest .tbe files
>>>> > > directly, or llvm-tapi could be used to produce stubs
that LLD can
>>>> be
>>>> > > taught to use. From a technical standpoint, these are
not mutually
>>>> > > exclusive. This step is a ways down the road, but is
definitely a
>>>> > > high-priority goal.
>>>> > >
>>>> > > I’m interested to hear your thoughts and feedback on
this.
>>>> > >
>>>> > > Best,
>>>> > > Armando
>>>> > >
>>>> > >
>>>> > > [1] https://github.com/ributzka/tapi
>>>> > > [2]
>>>>
https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html
>>>> > > [3]
>>>>
https://docs.bazel.build/versions/master/user-manual.html#flag--interface_shared_objects
>>>> > > [4]
>>>>
https://fuchsia.googlesource.com/zircon/+/master/scripts/shlib-symbols
>>>> > > [5]
>>>>
https://fuchsia.googlesource.com/zircon/+/master/scripts/dso-abi.h
>>>> > > [6] https://sourceware.org/libabigail/
>>>> > > [7]
>>>>
https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/abilist.awk;h=bad7c3807e478e50e63c3834aa8969214bdd6f63;hb=HEAD
>>>> > > [8]
>>>>
https://github.com/llvm-mirror/libcxx/blob/master/utils/sym_extract.py
>>>> > > [9]
>>>>
http://lists.llvm.org/pipermail/cfe-dev/2018-April/thread.html#57576
>>>> > > _______________________________________________
>>>> > > LLVM Developers mailing list
>>>> > > llvm-dev at lists.llvm.org
>>>> > >
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> >
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180927/a4638f73/attachment.html>

llvm dev - Sep 2018 - [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support