thr3ads.net - llvm dev - [llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Armando Montanez via llvm-dev

2018-Sep-26 15:29 UTC

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

Hello all,

LLVM-TAPI seeks to decouple the necessary link-time information for a
dynamic shared object from the implementation of the runtime object.
This process will be referred to as dynamic shared object (DSO)
stubbing throughout this proposal. A number of projects have
implemented their own versions of shared object stubbing for a variety
of reasons related to improving the overall linking experience. This
functionality is absent from LLVM despite how close the practice is to
LLVM’s domain. The goal of this project would be to produce a library
for LLVM that not only provides a means for DSO stubbing, but also
gives meaningful insight into the contents of these stubs and how they
change. I’ve collected a few example instances of object stubbing as
part of larger tools and the key benefits that resulted from them:

- Apple’s TAPI [1]: Stubbing used to reduce SDK size and improve build times.
- Oracle’s Solaris OS linker [2]: Stubbing used to improve build
times, and improve robustness of build system (against dependency
cycles and race conditions).
- Google’s Bazel [3]: Stubbing used to improve build times.
- Google’s Fuchsia [4] [5]: Stubbing used to improve build times.
- Android NDK: Stubbing used to reduce size of native sdk, control
exported symbols, and improve build times.

Somewhat tangentially, a tool called libabigail [6] provides utilities
for tracking changes relevant to ELF files in a meaningful way. One of
libabigai’s tools provides very detailed textual XML representations
of objects, which is especially useful in the absence of a preexisting
textual representation of shared objects’ exposed interfaces. Glibc
[7] and libc++ [8] have made an effort to address this in their own
ways by using scripts to produce textual representations of object
interfaces. This functionality makes it significantly easier to
analyze and control symbol visibility, though the existing solutions
are quite bespoke. Controlling these symbols can have an implicit
benefit of reducing binary size by pruning visible symbols, but the
more critical feature is being able to easily view and edit the
exposed symbols in the first place. Using human-readable stubs
addresses the issues of DSO analysis and control without requiring
highly specialized tools. This does not strive to replace tools
altogether; it just makes small tasks significantly more approachable.

llvm-tapi would strive to be an intersection between a means to
produce and link against stubs, and providing tools that offer more
control and insight into the public interfaces of DSOs. More
fundamentally, llvm-tapi would introduce a library to generate and
ingest human-readable stubs from DSOs to address these issues directly
in LLVM. Overall, this idea is most similar to the vein of Apple’s
TAPI, as the original TAPI also uses human-readable stubs.

In general, llvm-tapi should:

1. Produce human-readable text files from dynamic shared objects that
are concise, readable, and contain everything required for linking
that can’t be implicitly derived.
2. Produce linkable files from said human readable text files.
3. Provide tools to track and control the exposed interfaces of object files.
4. Integrate well with LLVM’s existing tools.
5. Strive to enable integration of the original TAPI code for Mach-O support.

There are a number of key benefits to using stubs and text-based
application binary interfaces such as:
- Reducing the size of dynamic shared objects used exclusively for linking.
- The ability to avoid re-linking an object when its dependencies’
exposed interfaces do not change but their implementation does (which
happens frequently).
- Simplicity of viewing a diff for a changed DSO interface.
A large number of other use cases exist; this would open up the floor
for a variety of other tools and future work as the concept is rather
generic.

The proposed YAML format would be analogous to Apple’s .tbd format but
differ in a few ways to support ELF object types. An example would be
as follows:

--- !tapi-tbe-v1
soname: someobj.so
architecture: aarch64
symbols:
 - name: fish
   type: object
   size: 48
 - name: foobar
   type: function
   warning-text: “deprecated in SOMEOBJ_1.3”
 - name: printf
   type: function
 - name: rndfunc
   type: function
   undefined: true
...

(Note that this doesn’t account for version sets, but such
functionality can be included in a later version.)

Most of the fields are self-explanatory, with size not being relevant
to function symbols, and warning text being purely optional. One
reason this departs from .tbd format is to make diffs much easier:
sorting symbols alphabetically on individual lines makes it much more
obvious which symbols are added, removed, or modified. Despite the
differences, the desire is for llvm-tapi to be structured such that
integrating Apple’s Mach-O TAPI will be plausible and welcomed. Prior
discussion [9] indicated interest in integrating Apple TAPI into LLVM,
so I’d definitely like to leave that door open and encourage that in
the future.

I feel the best place to start this is as a library to best facilitate
integration into other areas of LLVM, later wrapping it in a
standalone tool and eventually considering direct integration into
LLD. The tool will initially support basic generation of .tbe and stub
files from .tbe or ELF. This should give enough functionality for
manually checking shared object interface diffs, as well as having
access to linkable stubs. The goal is for the tool to eventually
provide additional functionality such as compatibility checking, but
that’s a ways into the future.shared

There’s multiple options for integrating llvm-tapi to work with LLD;
LLD could directly use llvm-tapi to produce and ingest .tbe files
directly, or llvm-tapi could be used to produce stubs that LLD can be
taught to use. From a technical standpoint, these are not mutually
exclusive. This step is a ways down the road, but is definitely a
high-priority goal.

I’m interested to hear your thoughts and feedback on this.

Best,
Armando


[1] https://github.com/ributzka/tapi
[2] https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html
[3]
https://docs.bazel.build/versions/master/user-manual.html#flag--interface_shared_objects
[4] https://fuchsia.googlesource.com/zircon/+/master/scripts/shlib-symbols
[5] https://fuchsia.googlesource.com/zircon/+/master/scripts/dso-abi.h
[6] https://sourceware.org/libabigail/
[7]
https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/abilist.awk;h=bad7c3807e478e50e63c3834aa8969214bdd6f63;hb=HEAD
[8] https://github.com/llvm-mirror/libcxx/blob/master/utils/sym_extract.py
[9] http://lists.llvm.org/pipermail/cfe-dev/2018-April/thread.html#57576

Rui Ueyama via llvm-dev

2018-Sep-26 16:50 UTC

head link

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

Hi,

Have you considered writing a tool to strip DSOs so that they contain only
the information needed for dynamic linking? Because the linker uses only
the symbol table and the symbol version table when linking against a DSO,
all the other sections such as .text or .data can be removed from a file
without affecting the output.

Obviously that stripped DSO is not human readable, but looks like it has a
few merits over inventing a new text description format: (1) you don't need
to invent something new at all, (2) is backward compatible with existing
linkers and other tools, (3) all the details of ELF format (such as symbol
versions) are naturally preserved, (4) is perhaps faster than reading a
text (especially given that LLVM YAML library is slow). You can make the
tool to sort symbols alphabetically, so that the tool produces the exact
same output for two different files that are semantically equivalent to the
linker.

On Wed, Sep 26, 2018 at 8:30 AM Armando Montanez via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hello all,
>
> LLVM-TAPI seeks to decouple the necessary link-time information for a
> dynamic shared object from the implementation of the runtime object.
> This process will be referred to as dynamic shared object (DSO)
> stubbing throughout this proposal. A number of projects have
> implemented their own versions of shared object stubbing for a variety
> of reasons related to improving the overall linking experience. This
> functionality is absent from LLVM despite how close the practice is to
> LLVM’s domain. The goal of this project would be to produce a library
> for LLVM that not only provides a means for DSO stubbing, but also
> gives meaningful insight into the contents of these stubs and how they
> change. I’ve collected a few example instances of object stubbing as
> part of larger tools and the key benefits that resulted from them:
>
> - Apple’s TAPI [1]: Stubbing used to reduce SDK size and improve build
> times.
> - Oracle’s Solaris OS linker [2]: Stubbing used to improve build
> times, and improve robustness of build system (against dependency
> cycles and race conditions).
> - Google’s Bazel [3]: Stubbing used to improve build times.
> - Google’s Fuchsia [4] [5]: Stubbing used to improve build times.
> - Android NDK: Stubbing used to reduce size of native sdk, control
> exported symbols, and improve build times.
>
> Somewhat tangentially, a tool called libabigail [6] provides utilities
> for tracking changes relevant to ELF files in a meaningful way. One of
> libabigai’s tools provides very detailed textual XML representations
> of objects, which is especially useful in the absence of a preexisting
> textual representation of shared objects’ exposed interfaces. Glibc
> [7] and libc++ [8] have made an effort to address this in their own
> ways by using scripts to produce textual representations of object
> interfaces. This functionality makes it significantly easier to
> analyze and control symbol visibility, though the existing solutions
> are quite bespoke. Controlling these symbols can have an implicit
> benefit of reducing binary size by pruning visible symbols, but the
> more critical feature is being able to easily view and edit the
> exposed symbols in the first place. Using human-readable stubs
> addresses the issues of DSO analysis and control without requiring
> highly specialized tools. This does not strive to replace tools
> altogether; it just makes small tasks significantly more approachable.
>
> llvm-tapi would strive to be an intersection between a means to
> produce and link against stubs, and providing tools that offer more
> control and insight into the public interfaces of DSOs. More
> fundamentally, llvm-tapi would introduce a library to generate and
> ingest human-readable stubs from DSOs to address these issues directly
> in LLVM. Overall, this idea is most similar to the vein of Apple’s
> TAPI, as the original TAPI also uses human-readable stubs.
>
> In general, llvm-tapi should:
>
> 1. Produce human-readable text files from dynamic shared objects that
> are concise, readable, and contain everything required for linking
> that can’t be implicitly derived.
> 2. Produce linkable files from said human readable text files.
> 3. Provide tools to track and control the exposed interfaces of object
> files.
> 4. Integrate well with LLVM’s existing tools.
> 5. Strive to enable integration of the original TAPI code for Mach-O
> support.
>
> There are a number of key benefits to using stubs and text-based
> application binary interfaces such as:
> - Reducing the size of dynamic shared objects used exclusively for linking.
> - The ability to avoid re-linking an object when its dependencies’
> exposed interfaces do not change but their implementation does (which
> happens frequently).
> - Simplicity of viewing a diff for a changed DSO interface.
> A large number of other use cases exist; this would open up the floor
> for a variety of other tools and future work as the concept is rather
> generic.
>
> The proposed YAML format would be analogous to Apple’s .tbd format but
> differ in a few ways to support ELF object types. An example would be
> as follows:
>
> --- !tapi-tbe-v1
> soname: someobj.so
> architecture: aarch64
> symbols:
>  - name: fish
>    type: object
>    size: 48
>  - name: foobar
>    type: function
>    warning-text: “deprecated in SOMEOBJ_1.3”
>  - name: printf
>    type: function
>  - name: rndfunc
>    type: function
>    undefined: true
> ...
>
> (Note that this doesn’t account for version sets, but such
> functionality can be included in a later version.)
>
> Most of the fields are self-explanatory, with size not being relevant
> to function symbols, and warning text being purely optional. One
> reason this departs from .tbd format is to make diffs much easier:
> sorting symbols alphabetically on individual lines makes it much more
> obvious which symbols are added, removed, or modified. Despite the
> differences, the desire is for llvm-tapi to be structured such that
> integrating Apple’s Mach-O TAPI will be plausible and welcomed. Prior
> discussion [9] indicated interest in integrating Apple TAPI into LLVM,
> so I’d definitely like to leave that door open and encourage that in
> the future.
>
> I feel the best place to start this is as a library to best facilitate
> integration into other areas of LLVM, later wrapping it in a
> standalone tool and eventually considering direct integration into
> LLD. The tool will initially support basic generation of .tbe and stub
> files from .tbe or ELF. This should give enough functionality for
> manually checking shared object interface diffs, as well as having
> access to linkable stubs. The goal is for the tool to eventually
> provide additional functionality such as compatibility checking, but
> that’s a ways into the future.shared
>
> There’s multiple options for integrating llvm-tapi to work with LLD;
> LLD could directly use llvm-tapi to produce and ingest .tbe files
> directly, or llvm-tapi could be used to produce stubs that LLD can be
> taught to use. From a technical standpoint, these are not mutually
> exclusive. This step is a ways down the road, but is definitely a
> high-priority goal.
>
> I’m interested to hear your thoughts and feedback on this.
>
> Best,
> Armando
>
>
> [1] https://github.com/ributzka/tapi
> [2] https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html
> [3]
>
https://docs.bazel.build/versions/master/user-manual.html#flag--interface_shared_objects
> [4] https://fuchsia.googlesource.com/zircon/+/master/scripts/shlib-symbols
> [5] https://fuchsia.googlesource.com/zircon/+/master/scripts/dso-abi.h
> [6] https://sourceware.org/libabigail/
> [7]
>
https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/abilist.awk;h=bad7c3807e478e50e63c3834aa8969214bdd6f63;hb=HEAD
> [8] https://github.com/llvm-mirror/libcxx/blob/master/utils/sym_extract.py
> [9] http://lists.llvm.org/pipermail/cfe-dev/2018-April/thread.html#57576
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180926/13a0a02f/attachment.html>

Armando Montanez via llvm-dev

2018-Sep-26 17:03 UTC

head link

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

Absolutely. The goal of the tool is to produce both textual and binary
DSO stubs. This means you could take a DSO, produce a textual stub,
modify it however you wish, and then produce a linkable binary stub
from that modified .tbe. That, or you could bypass the textual portion
altogether and just produce binary stubs from DSOs. While the textual
format is useful, the goal is to make the tool complete and maximally
applicable by producing ELF stubs as well.

Alphabetical symbol sorting is currently a part of the plan as well.
It makes producing a diff easier as well.
On Wed, Sep 26, 2018 at 9:51 AM Rui Ueyama <ruiu at google.com>
wrote:>
> Hi,
>
> Have you considered writing a tool to strip DSOs so that they contain only
the information needed for dynamic linking? Because the linker uses only the
symbol table and the symbol version table when linking against a DSO, all the
other sections such as .text or .data can be removed from a file without
affecting the output.
>
> Obviously that stripped DSO is not human readable, but looks like it has a
few merits over inventing a new text description format: (1) you don't need
to invent something new at all, (2) is backward compatible with existing linkers
and other tools, (3) all the details of ELF format (such as symbol versions) are
naturally preserved, (4) is perhaps faster than reading a text (especially given
that LLVM YAML library is slow). You can make the tool to sort symbols
alphabetically, so that the tool produces the exact same output for two
different files that are semantically equivalent to the linker.
>
> On Wed, Sep 26, 2018 at 8:30 AM Armando Montanez via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>>
>> Hello all,
>>
>> LLVM-TAPI seeks to decouple the necessary link-time information for a
>> dynamic shared object from the implementation of the runtime object.
>> This process will be referred to as dynamic shared object (DSO)
>> stubbing throughout this proposal. A number of projects have
>> implemented their own versions of shared object stubbing for a variety
>> of reasons related to improving the overall linking experience. This
>> functionality is absent from LLVM despite how close the practice is to
>> LLVM’s domain. The goal of this project would be to produce a library
>> for LLVM that not only provides a means for DSO stubbing, but also
>> gives meaningful insight into the contents of these stubs and how they
>> change. I’ve collected a few example instances of object stubbing as
>> part of larger tools and the key benefits that resulted from them:
>>
>> - Apple’s TAPI [1]: Stubbing used to reduce SDK size and improve build
times.
>> - Oracle’s Solaris OS linker [2]: Stubbing used to improve build
>> times, and improve robustness of build system (against dependency
>> cycles and race conditions).
>> - Google’s Bazel [3]: Stubbing used to improve build times.
>> - Google’s Fuchsia [4] [5]: Stubbing used to improve build times.
>> - Android NDK: Stubbing used to reduce size of native sdk, control
>> exported symbols, and improve build times.
>>
>> Somewhat tangentially, a tool called libabigail [6] provides utilities
>> for tracking changes relevant to ELF files in a meaningful way. One of
>> libabigai’s tools provides very detailed textual XML representations
>> of objects, which is especially useful in the absence of a preexisting
>> textual representation of shared objects’ exposed interfaces. Glibc
>> [7] and libc++ [8] have made an effort to address this in their own
>> ways by using scripts to produce textual representations of object
>> interfaces. This functionality makes it significantly easier to
>> analyze and control symbol visibility, though the existing solutions
>> are quite bespoke. Controlling these symbols can have an implicit
>> benefit of reducing binary size by pruning visible symbols, but the
>> more critical feature is being able to easily view and edit the
>> exposed symbols in the first place. Using human-readable stubs
>> addresses the issues of DSO analysis and control without requiring
>> highly specialized tools. This does not strive to replace tools
>> altogether; it just makes small tasks significantly more approachable.
>>
>> llvm-tapi would strive to be an intersection between a means to
>> produce and link against stubs, and providing tools that offer more
>> control and insight into the public interfaces of DSOs. More
>> fundamentally, llvm-tapi would introduce a library to generate and
>> ingest human-readable stubs from DSOs to address these issues directly
>> in LLVM. Overall, this idea is most similar to the vein of Apple’s
>> TAPI, as the original TAPI also uses human-readable stubs.
>>
>> In general, llvm-tapi should:
>>
>> 1. Produce human-readable text files from dynamic shared objects that
>> are concise, readable, and contain everything required for linking
>> that can’t be implicitly derived.
>> 2. Produce linkable files from said human readable text files.
>> 3. Provide tools to track and control the exposed interfaces of object
files.
>> 4. Integrate well with LLVM’s existing tools.
>> 5. Strive to enable integration of the original TAPI code for Mach-O
support.
>>
>> There are a number of key benefits to using stubs and text-based
>> application binary interfaces such as:
>> - Reducing the size of dynamic shared objects used exclusively for
linking.
>> - The ability to avoid re-linking an object when its dependencies’
>> exposed interfaces do not change but their implementation does (which
>> happens frequently).
>> - Simplicity of viewing a diff for a changed DSO interface.
>> A large number of other use cases exist; this would open up the floor
>> for a variety of other tools and future work as the concept is rather
>> generic.
>>
>> The proposed YAML format would be analogous to Apple’s .tbd format but
>> differ in a few ways to support ELF object types. An example would be
>> as follows:
>>
>> --- !tapi-tbe-v1
>> soname: someobj.so
>> architecture: aarch64
>> symbols:
>>  - name: fish
>>    type: object
>>    size: 48
>>  - name: foobar
>>    type: function
>>    warning-text: “deprecated in SOMEOBJ_1.3”
>>  - name: printf
>>    type: function
>>  - name: rndfunc
>>    type: function
>>    undefined: true
>> ...
>>
>> (Note that this doesn’t account for version sets, but such
>> functionality can be included in a later version.)
>>
>> Most of the fields are self-explanatory, with size not being relevant
>> to function symbols, and warning text being purely optional. One
>> reason this departs from .tbd format is to make diffs much easier:
>> sorting symbols alphabetically on individual lines makes it much more
>> obvious which symbols are added, removed, or modified. Despite the
>> differences, the desire is for llvm-tapi to be structured such that
>> integrating Apple’s Mach-O TAPI will be plausible and welcomed. Prior
>> discussion [9] indicated interest in integrating Apple TAPI into LLVM,
>> so I’d definitely like to leave that door open and encourage that in
>> the future.
>>
>> I feel the best place to start this is as a library to best facilitate
>> integration into other areas of LLVM, later wrapping it in a
>> standalone tool and eventually considering direct integration into
>> LLD. The tool will initially support basic generation of .tbe and stub
>> files from .tbe or ELF. This should give enough functionality for
>> manually checking shared object interface diffs, as well as having
>> access to linkable stubs. The goal is for the tool to eventually
>> provide additional functionality such as compatibility checking, but
>> that’s a ways into the future.shared
>>
>> There’s multiple options for integrating llvm-tapi to work with LLD;
>> LLD could directly use llvm-tapi to produce and ingest .tbe files
>> directly, or llvm-tapi could be used to produce stubs that LLD can be
>> taught to use. From a technical standpoint, these are not mutually
>> exclusive. This step is a ways down the road, but is definitely a
>> high-priority goal.
>>
>> I’m interested to hear your thoughts and feedback on this.
>>
>> Best,
>> Armando
>>
>>
>> [1] https://github.com/ributzka/tapi
>> [2] https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html
>> [3]
https://docs.bazel.build/versions/master/user-manual.html#flag--interface_shared_objects
>> [4]
https://fuchsia.googlesource.com/zircon/+/master/scripts/shlib-symbols
>> [5] https://fuchsia.googlesource.com/zircon/+/master/scripts/dso-abi.h
>> [6] https://sourceware.org/libabigail/
>> [7]
https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/abilist.awk;h=bad7c3807e478e50e63c3834aa8969214bdd6f63;hb=HEAD
>> [8]
https://github.com/llvm-mirror/libcxx/blob/master/utils/sym_extract.py
>> [9]
http://lists.llvm.org/pipermail/cfe-dev/2018-April/thread.html#57576
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Steven Wu via llvm-dev

2018-Sep-26 21:42 UTC

head link

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

Hi Armando

Thanks for the detailed RFC and all the background research. I think the concept
is good and I will be happy to work with you to integrate the ELF implementation
with Apple's MachO implementation and contribute it upstream. Do you have
any proposal on how to integrate with Apple's tapi and how should we
collaborate?

Also, Apple's tapi does more than just stubbing. Are you interested to add
ELF support for other features as well? (I guess it should not be too hard to do
that).

Thanks

Steven
> On Sep 26, 2018, at 8:29 AM, Armando Montanez via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hello all,
> 
> LLVM-TAPI seeks to decouple the necessary link-time information for a
> dynamic shared object from the implementation of the runtime object.
> This process will be referred to as dynamic shared object (DSO)
> stubbing throughout this proposal. A number of projects have
> implemented their own versions of shared object stubbing for a variety
> of reasons related to improving the overall linking experience. This
> functionality is absent from LLVM despite how close the practice is to
> LLVM’s domain. The goal of this project would be to produce a library
> for LLVM that not only provides a means for DSO stubbing, but also
> gives meaningful insight into the contents of these stubs and how they
> change. I’ve collected a few example instances of object stubbing as
> part of larger tools and the key benefits that resulted from them:
> 
> - Apple’s TAPI [1]: Stubbing used to reduce SDK size and improve build
times.
> - Oracle’s Solaris OS linker [2]: Stubbing used to improve build
> times, and improve robustness of build system (against dependency
> cycles and race conditions).
> - Google’s Bazel [3]: Stubbing used to improve build times.
> - Google’s Fuchsia [4] [5]: Stubbing used to improve build times.
> - Android NDK: Stubbing used to reduce size of native sdk, control
> exported symbols, and improve build times.
> 
> Somewhat tangentially, a tool called libabigail [6] provides utilities
> for tracking changes relevant to ELF files in a meaningful way. One of
> libabigai’s tools provides very detailed textual XML representations
> of objects, which is especially useful in the absence of a preexisting
> textual representation of shared objects’ exposed interfaces. Glibc
> [7] and libc++ [8] have made an effort to address this in their own
> ways by using scripts to produce textual representations of object
> interfaces. This functionality makes it significantly easier to
> analyze and control symbol visibility, though the existing solutions
> are quite bespoke. Controlling these symbols can have an implicit
> benefit of reducing binary size by pruning visible symbols, but the
> more critical feature is being able to easily view and edit the
> exposed symbols in the first place. Using human-readable stubs
> addresses the issues of DSO analysis and control without requiring
> highly specialized tools. This does not strive to replace tools
> altogether; it just makes small tasks significantly more approachable.
> 
> llvm-tapi would strive to be an intersection between a means to
> produce and link against stubs, and providing tools that offer more
> control and insight into the public interfaces of DSOs. More
> fundamentally, llvm-tapi would introduce a library to generate and
> ingest human-readable stubs from DSOs to address these issues directly
> in LLVM. Overall, this idea is most similar to the vein of Apple’s
> TAPI, as the original TAPI also uses human-readable stubs.
> 
> In general, llvm-tapi should:
> 
> 1. Produce human-readable text files from dynamic shared objects that
> are concise, readable, and contain everything required for linking
> that can’t be implicitly derived.
> 2. Produce linkable files from said human readable text files.
> 3. Provide tools to track and control the exposed interfaces of object
files.
> 4. Integrate well with LLVM’s existing tools.
> 5. Strive to enable integration of the original TAPI code for Mach-O
support.
> 
> There are a number of key benefits to using stubs and text-based
> application binary interfaces such as:
> - Reducing the size of dynamic shared objects used exclusively for linking.
> - The ability to avoid re-linking an object when its dependencies’
> exposed interfaces do not change but their implementation does (which
> happens frequently).
> - Simplicity of viewing a diff for a changed DSO interface.
> A large number of other use cases exist; this would open up the floor
> for a variety of other tools and future work as the concept is rather
> generic.
> 
> The proposed YAML format would be analogous to Apple’s .tbd format but
> differ in a few ways to support ELF object types. An example would be
> as follows:
> 
> --- !tapi-tbe-v1
> soname: someobj.so
> architecture: aarch64
> symbols:
> - name: fish
>   type: object
>   size: 48
> - name: foobar
>   type: function
>   warning-text: “deprecated in SOMEOBJ_1.3”
> - name: printf
>   type: function
> - name: rndfunc
>   type: function
>   undefined: true
> ...
> 
> (Note that this doesn’t account for version sets, but such
> functionality can be included in a later version.)
> 
> Most of the fields are self-explanatory, with size not being relevant
> to function symbols, and warning text being purely optional. One
> reason this departs from .tbd format is to make diffs much easier:
> sorting symbols alphabetically on individual lines makes it much more
> obvious which symbols are added, removed, or modified. Despite the
> differences, the desire is for llvm-tapi to be structured such that
> integrating Apple’s Mach-O TAPI will be plausible and welcomed. Prior
> discussion [9] indicated interest in integrating Apple TAPI into LLVM,
> so I’d definitely like to leave that door open and encourage that in
> the future.
> 
> I feel the best place to start this is as a library to best facilitate
> integration into other areas of LLVM, later wrapping it in a
> standalone tool and eventually considering direct integration into
> LLD. The tool will initially support basic generation of .tbe and stub
> files from .tbe or ELF. This should give enough functionality for
> manually checking shared object interface diffs, as well as having
> access to linkable stubs. The goal is for the tool to eventually
> provide additional functionality such as compatibility checking, but
> that’s a ways into the future.shared
> 
> There’s multiple options for integrating llvm-tapi to work with LLD;
> LLD could directly use llvm-tapi to produce and ingest .tbe files
> directly, or llvm-tapi could be used to produce stubs that LLD can be
> taught to use. From a technical standpoint, these are not mutually
> exclusive. This step is a ways down the road, but is definitely a
> high-priority goal.
> 
> I’m interested to hear your thoughts and feedback on this.
> 
> Best,
> Armando
> 
> 
> [1] https://github.com/ributzka/tapi
> [2] https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html
> [3]
https://docs.bazel.build/versions/master/user-manual.html#flag--interface_shared_objects
> [4] https://fuchsia.googlesource.com/zircon/+/master/scripts/shlib-symbols
> [5] https://fuchsia.googlesource.com/zircon/+/master/scripts/dso-abi.h
> [6] https://sourceware.org/libabigail/
> [7]
https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/abilist.awk;h=bad7c3807e478e50e63c3834aa8969214bdd6f63;hb=HEAD
> [8] https://github.com/llvm-mirror/libcxx/blob/master/utils/sym_extract.py
> [9] http://lists.llvm.org/pipermail/cfe-dev/2018-April/thread.html#57576
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Armando Montanez via llvm-dev

2018-Sep-27 21:42 UTC

head link

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

Since the goal is to start llvm-tapi more or less from scratch, I feel
the best approach initially is to focus on the structure as a key
point of feedback in initial reviews. Once the foundations are set,
integrating Mach-O TAPI in parallel with the ELF implementation should
be relatively straightforward. The features outside of stubbing aren't
as appealing for ELF, so I probably won't be working on extending that
functionality. With that being said, the overall design goal is
generalization/abstraction where possible to welcome feature parity in
case it is eventually desired. I'm sure we'll run into things that
belong in the tool but end up being uniquely specialized, and it will
probably be best to address them on a case-by-case basis.

On Wed, Sep 26, 2018 at 2:42 PM Steven Wu <stevenwu at apple.com>
wrote:>
> Hi Armando
>
> Thanks for the detailed RFC and all the background research. I think the
concept is good and I will be happy to work with you to integrate the ELF
implementation with Apple's MachO implementation and contribute it upstream.
Do you have any proposal on how to integrate with Apple's tapi and how
should we collaborate?
>
> Also, Apple's tapi does more than just stubbing. Are you interested to
add ELF support for other features as well? (I guess it should not be too hard
to do that).
>
> Thanks
>
> Steven
>
> > On Sep 26, 2018, at 8:29 AM, Armando Montanez via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> >
> > Hello all,
> >
> > LLVM-TAPI seeks to decouple the necessary link-time information for a
> > dynamic shared object from the implementation of the runtime object.
> > This process will be referred to as dynamic shared object (DSO)
> > stubbing throughout this proposal. A number of projects have
> > implemented their own versions of shared object stubbing for a variety
> > of reasons related to improving the overall linking experience. This
> > functionality is absent from LLVM despite how close the practice is to
> > LLVM’s domain. The goal of this project would be to produce a library
> > for LLVM that not only provides a means for DSO stubbing, but also
> > gives meaningful insight into the contents of these stubs and how they
> > change. I’ve collected a few example instances of object stubbing as
> > part of larger tools and the key benefits that resulted from them:
> >
> > - Apple’s TAPI [1]: Stubbing used to reduce SDK size and improve build
times.
> > - Oracle’s Solaris OS linker [2]: Stubbing used to improve build
> > times, and improve robustness of build system (against dependency
> > cycles and race conditions).
> > - Google’s Bazel [3]: Stubbing used to improve build times.
> > - Google’s Fuchsia [4] [5]: Stubbing used to improve build times.
> > - Android NDK: Stubbing used to reduce size of native sdk, control
> > exported symbols, and improve build times.
> >
> > Somewhat tangentially, a tool called libabigail [6] provides utilities
> > for tracking changes relevant to ELF files in a meaningful way. One of
> > libabigai’s tools provides very detailed textual XML representations
> > of objects, which is especially useful in the absence of a preexisting
> > textual representation of shared objects’ exposed interfaces. Glibc
> > [7] and libc++ [8] have made an effort to address this in their own
> > ways by using scripts to produce textual representations of object
> > interfaces. This functionality makes it significantly easier to
> > analyze and control symbol visibility, though the existing solutions
> > are quite bespoke. Controlling these symbols can have an implicit
> > benefit of reducing binary size by pruning visible symbols, but the
> > more critical feature is being able to easily view and edit the
> > exposed symbols in the first place. Using human-readable stubs
> > addresses the issues of DSO analysis and control without requiring
> > highly specialized tools. This does not strive to replace tools
> > altogether; it just makes small tasks significantly more approachable.
> >
> > llvm-tapi would strive to be an intersection between a means to
> > produce and link against stubs, and providing tools that offer more
> > control and insight into the public interfaces of DSOs. More
> > fundamentally, llvm-tapi would introduce a library to generate and
> > ingest human-readable stubs from DSOs to address these issues directly
> > in LLVM. Overall, this idea is most similar to the vein of Apple’s
> > TAPI, as the original TAPI also uses human-readable stubs.
> >
> > In general, llvm-tapi should:
> >
> > 1. Produce human-readable text files from dynamic shared objects that
> > are concise, readable, and contain everything required for linking
> > that can’t be implicitly derived.
> > 2. Produce linkable files from said human readable text files.
> > 3. Provide tools to track and control the exposed interfaces of object
files.
> > 4. Integrate well with LLVM’s existing tools.
> > 5. Strive to enable integration of the original TAPI code for Mach-O
support.
> >
> > There are a number of key benefits to using stubs and text-based
> > application binary interfaces such as:
> > - Reducing the size of dynamic shared objects used exclusively for
linking.
> > - The ability to avoid re-linking an object when its dependencies’
> > exposed interfaces do not change but their implementation does (which
> > happens frequently).
> > - Simplicity of viewing a diff for a changed DSO interface.
> > A large number of other use cases exist; this would open up the floor
> > for a variety of other tools and future work as the concept is rather
> > generic.
> >
> > The proposed YAML format would be analogous to Apple’s .tbd format but
> > differ in a few ways to support ELF object types. An example would be
> > as follows:
> >
> > --- !tapi-tbe-v1
> > soname: someobj.so
> > architecture: aarch64
> > symbols:
> > - name: fish
> >   type: object
> >   size: 48
> > - name: foobar
> >   type: function
> >   warning-text: “deprecated in SOMEOBJ_1.3”
> > - name: printf
> >   type: function
> > - name: rndfunc
> >   type: function
> >   undefined: true
> > ...
> >
> > (Note that this doesn’t account for version sets, but such
> > functionality can be included in a later version.)
> >
> > Most of the fields are self-explanatory, with size not being relevant
> > to function symbols, and warning text being purely optional. One
> > reason this departs from .tbd format is to make diffs much easier:
> > sorting symbols alphabetically on individual lines makes it much more
> > obvious which symbols are added, removed, or modified. Despite the
> > differences, the desire is for llvm-tapi to be structured such that
> > integrating Apple’s Mach-O TAPI will be plausible and welcomed. Prior
> > discussion [9] indicated interest in integrating Apple TAPI into LLVM,
> > so I’d definitely like to leave that door open and encourage that in
> > the future.
> >
> > I feel the best place to start this is as a library to best facilitate
> > integration into other areas of LLVM, later wrapping it in a
> > standalone tool and eventually considering direct integration into
> > LLD. The tool will initially support basic generation of .tbe and stub
> > files from .tbe or ELF. This should give enough functionality for
> > manually checking shared object interface diffs, as well as having
> > access to linkable stubs. The goal is for the tool to eventually
> > provide additional functionality such as compatibility checking, but
> > that’s a ways into the future.shared
> >
> > There’s multiple options for integrating llvm-tapi to work with LLD;
> > LLD could directly use llvm-tapi to produce and ingest .tbe files
> > directly, or llvm-tapi could be used to produce stubs that LLD can be
> > taught to use. From a technical standpoint, these are not mutually
> > exclusive. This step is a ways down the road, but is definitely a
> > high-priority goal.
> >
> > I’m interested to hear your thoughts and feedback on this.
> >
> > Best,
> > Armando
> >
> >
> > [1] https://github.com/ributzka/tapi
> > [2]
https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html
> > [3]
https://docs.bazel.build/versions/master/user-manual.html#flag--interface_shared_objects
> > [4]
https://fuchsia.googlesource.com/zircon/+/master/scripts/shlib-symbols
> > [5] https://fuchsia.googlesource.com/zircon/+/master/scripts/dso-abi.h
> > [6] https://sourceware.org/libabigail/
> > [7]
https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/abilist.awk;h=bad7c3807e478e50e63c3834aa8969214bdd6f63;hb=HEAD
> > [8]
https://github.com/llvm-mirror/libcxx/blob/master/utils/sym_extract.py
> > [9]
http://lists.llvm.org/pipermail/cfe-dev/2018-April/thread.html#57576
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

llvm dev - Sep 2018 - [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support

[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support