On Wed, Jun 23, 2021 at 3:43 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:>
>
>
> On Tue, Jun 22, 2021 at 11:09 PM Fāng-ruì Sòng via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>> On Tue, Jun 22, 2021 at 10:20 PM Petr Hosek <phosek at
google.com> wrote:
>> >
>> > I guess this depends on a particular implementation of the
distributed build system. In the case of Goma, we only supply the compiler
binary which was invoked as the command (that binary links glibc as a shared
library but we assume that one is supplied by the host system), all other files
like headers are passed together with the compiler invocation as inputs. If we
used dynamic linking, Goma would need to figure out what other shared libraries
need to be sent to the server. It's certainly doable but it's an extra
complexity we would like to avoid.
>>
>> For non-clang executables, -DLLVM_LINK_LLVM_DYLIB=on just adds one
>> more DT_NEEDED.
>> The DT_NEEDED entry can use a $ORIGIN based DT_RUNPATH. Can Goma
>> detect the libraries shipped with the tools?
>> I asked because I feel this could be an artificial limitation which
>> could be straightforwardly addressed in Goma.
>> A toolchain executable using a accompanying shared object is not rare
>> (thinking of plugins).
>>
>> Multiplexing LLVM tools is one alternative but I am a bit concerned
>> with the extra complexity and the new configuration the build system
>> needs to support.
>>
>> https://lists.llvm.org/pipermail/llvm-dev/2021-June/151338.html
>> mentioned another approach which doesn't require intrusive
>> modification to the tools.
>>
>> As for PGO+LTO, you can apply them to libLLVM-13git.so as well.
>
>
> Some thoughts if we're getting into PGO+LTO territory, I feel that both
methods presented here will be at a disadvantage compared to building clang and
lld into their own binaries.
> For example I remember that on Mac an important optimization for clang
builds was to order the functions in the binary roughly in the order in which
they are first encountered during execution, assuming the same behavior for lld
you can see the conflicting optimization goal... You can also think about how
libSupport may be differently "hot" on a clang PGO profile compared to
lld and would result in different optimization.
If PGO+LTO is desired, the executables can be split this way, assuming
the performance of
llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}
doesn't matter.
* clang (libLLVM*.a)
* lld + llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}
(libLLVM-13git.so)
> LTO also benefits from "internalizing", basically building a
static binary where only `main` is exported and everything else becomes an
internal linkage is the best case: pointer escaping, global analysis, etc all
become more powerful. Optimizing a shared library kind of makes every symbol
public, and I suspect the busybox approach may be better on this aspect (you get
back to a single public main, but it can reach much more code though).
With --version-script we can internalize shared object symbols as
well. For example, this has been used to facilitate whole-program
devirtualization (https://reviews.llvm.org/D98686).
With https://lists.llvm.org/pipermail/llvm-dev/2021-June/151338.html
we can get a list of roots which need to be exported.
A thin executable plus a -fvisibility-inlines-hidden +
-Bsymbolic-functions shared object is almost identical to a PIE.
>
>>
>>
>> > On Tue, Jun 22, 2021 at 10:09 PM David Blaikie <dblaikie at
gmail.com> wrote:
>> >>
>> >> On Tue, Jun 22, 2021 at 10:00 PM Petr Hosek via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>> >>>
>> >>> From our perspective as a toolchain vendor, even if using
shared libraries could get us closer to static linking in terms of performance,
we'd still prefer static linking for the ease of distribution. Dealing with
a single statically linked executable is much easier than dealing with multiple
shared libraries. This is especially important in distributed compilation
environments like Goma.
>> >>
>> >>
>> >> What makes it especially complicated for distributed
compilation environments? (I'd expect a toolchain contains so many files
that whether it's one binary, or a binary and a handful of shared libraries
wouldn't change the general implementation complexity of a distributed build
system?)
>> >>
>> >>>
>> >>>
>> >>> When comparing performance between static and dynamic
linking, I'd also recommend doing a comparison between binaries built with
PGO+LTO. Plain -O3 leaves a lot of performance on the table and as far as
I'm aware, most toolchain vendors use PGO+LTO.
>> >>>
>> >>> On Tue, Jun 22, 2021 at 5:00 PM Fangrui Song via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>> >>>>
>> >>>> On 2021-06-22, Leonard Chan via llvm-dev wrote:
>> >>>> >Small update: I have a WIP prototype of the tool
at
>> >>>> >https://reviews.llvm.org/D104686. The prototype
only includes llvm-objcopy
>> >>>> >and llvm-objdump packed together, but we're
seeing size benefits from
>> >>>> >busyboxing those two compared against having two
separate tools. (More
>> >>>> >details in the prototype's description.) I
don't plan on landing this as-is
>> >>>> >anytime soon and there's still some things
I'd like to improve/change and
>> >>>> >get feedback on.
>> >>>> >
>> >>>> >To answer some replies:
>> >>>> >
>> >>>> >- Ideally, we could start off with an incremental
approach and not package
>> >>>> >large tools like clang/lld off the bat. The llvm-*
tools seem like a good
>> >>>> >place to start since they're generally a bunch
of relatively small binaries
>> >>>> >that all share a subset of functions in libLLVM,
but don't necessarily use
>> >>>> >all of libLLVM, so statically linking them
together (with --gc-sections)
>> >>>> >can help dedup a lot of shared components vs
having separate statically
>> >>>> >compiled tools. In my measurements, the busybox
tool containing
>> >>>> >llvm-objcopy+objdump is negligibly larger than
llvm-objdump on its own (a
>> >>>> >couple KB difference) indicating a lot of shared
code between objdump and
>> >>>> >objcopy.
>> >>>> >
>> >>>> >- Will Dietz's multiplexing tool looks like a
good place to start from. The
>> >>>> >only concern I can see though is mostly the amount
of work needed to update
>> >>>> >it to LLVM 13.
>> >>>> >
>> >>>> >- We don't have plans for windows support now,
but it's not off the table.
>> >>>> >(Been mostly focusing on *nix for now). Depending
on overall traction for
>> >>>> >this idea, we could approach incrementally and add
support for different
>> >>>> >platforms over time.
>> >>>>
>> >>>> -DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on
-DLLVM_TARGETS_TO_BUILD=X86 (custom1)
>> >>>> vs
>> >>>> -DLLVM_TARGETS_TO_BUILD=X86 (custom2)
>> >>>>
>> >>>>
>> >>>> # This is the lower bound for any multiplexing
approach. clang is the largest executable.
>> >>>> % stat -c %s /tmp/out/custom2/bin/clang-13
>> >>>> 102900408
>> >>>>
>> >>>> I have built clang, lld and a bunch of ELF binary
utilities.
>> >>>>
>> >>>> % stat -c %s /tmp/out/custom1/lib/libLLVM-13git.so
/tmp/out/custom1/lib/libclang-cpp.so.13git
/tmp/out/custom1/bin/{clang-13,lld,llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}}
| awk '{s+=$1}END{print s}'
>> >>>> 138896544
>> >>>>
>> >>>> % stat -c %s
/tmp/out/custom2/bin/{clang-13,lld,llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}}
| awk '{s+=$1}END{print s}'
>> >>>> 209054440
>> >>>>
>> >>>>
>> >>>> The -DLLVM_LINK_LLVM_DYLIB=on
-DCLANG_LINK_CLANG_DYLIB=on build is doing a really good job.
>> >>>>
>> >>>> A multiplexing approach can squeeze some bytes from
138896544 toward 102900408,
>> >>>> but how much can it do?
>> >>>>
>> >>>>
>> >>>> >- I'm starting to think the `cl::opt` to
`OptTable` issue might be
>> >>>> >orthogonal to the busybox implementation. The tool
essentially dispatches
>> >>>> >to different "main" functions in
different tools, but as long as we don't
>> >>>> >do anything within busybox after exiting that
tool's main, then the global
>> >>>> >state issues we weren't sure of with `cl::opt`
might not be of any concern
>> >>>> >now. It may be an issue down the line if,
let's say, the tool flags moved
>> >>>> >from being "owned" by the tools
themselves to instead being "owned" by
>> >>>> >busybox, and then we'd have to merge
similarly-named flags together. In
>> >>>> >that case, migrating these tools to use `OptTable`
may be necessary since
>> >>>> >(I think) `OptTable` should handle this. This may
be a tedious task, but
>> >>>> >this is just to say that busybox won't need to
be immediately blocked on it.
>> >>>>
>> >>>> Such improvement is useful even if we don't do
multiplexing.
>> >>>> I switched llvm-symbolizer. thakis switched
llvm-objdump.
>> >>>> I can look at some binary utilities.
>> >>>>
>> >>>> >- I haven't seen any issues with colliding
symbols when linking (although
>> >>>> >I've only merged two tools for now). I suspect
that with small-ish llvm-*
>> >>>> >tools, the bulk of their code is shared from
libLLVM, and they have their
>> >>>> >own distinct logic built on top of it, which could
mean a low chance of
>> >>>> >conflicting internal ABIs.
>> >>>> >
>> >>>> >On Mon, Jun 21, 2021 at 10:54 AM Leonard Chan
<leonardchan at google.com>
>> >>>> >wrote:
>> >>>> >
>> >>>> >> Hello all,
>> >>>> >>
>> >>>> >> When building LLVM tools, including Clang and
lld, it's currently possible
>> >>>> >> to use either static or shared linking for
LLVM libraries. The latter can
>> >>>> >> significantly reduce the size of the
toolchain since we aren't duplicating
>> >>>> >> the same code in every binary, but the
dynamic relocations can affect
>> >>>> >> performance. The former doesn't affect
performance but significantly
>> >>>> >> increases the size of our toolchain.
>> >>>> >>
>> >>>> >> We would like to implement a support for a
third approach which we call,
>> >>>> >> for a lack of better term,
"busybox" feature, where everything is compiled
>> >>>> >> into a single binary which then dispatches
into an appropriate tool
>> >>>> >> depending on the first command. This approach
can significantly reduce the
>> >>>> >> size by deduplicating all of the shared code
without affecting the
>> >>>> >> performance.
>> >>>> >>
>> >>>> >> In terms of implementation, the build would
produce a single binary called
>> >>>> >> `llvm` and the first command would identify
the tool. For example, instead
>> >>>> >> of invoking `llvm-nm` you'd invoke `llvm
nm`. Ideally we would also support
>> >>>> >> creation of `llvm-nm` symlink which redirects
to `llvm` for backwards
>> >>>> >> compatibility.
>> >>>> >> This functionality would ideally be
implemented as an option in the CMake
>> >>>> >> build that toolchain vendors can opt into.
>> >>>> >>
>> >>>> >> The implementation would have to replace
`main` function of each tool with
>> >>>> >> an entrypoint regular function which is
registered into a tool registry.
>> >>>> >> This could be wrapped in a macro for
convenience. When the "busybox"
>> >>>> >> feature is disabled, the macro would expand
to a `main` function as before
>> >>>> >> and redirect to the entrypoint function. When
the "busybox" feature is
>> >>>> >> enabled, it would register the entrypoint
function into the registry, which
>> >>>> >> would be responsible for the dispatching
based on the tool name. Ideally,
>> >>>> >> toolchain maintainers would also be able to
control which tools they could
>> >>>> >> add to the "busybox" binary via
CMake build options, so toolchains will
>> >>>> >> only include the tools they use.
>> >>>> >>
>> >>>> >> One implementation detail we think will be an
issue is merging arguments
>> >>>> >> in individual tools that use `cl::opt`.
`cl::opt` works by maintaining a
>> >>>> >> global state of flags, but we aren’t
confident of what the resulting
>> >>>> >> behavior will be when merging them together
in the dispatching `main`. What
>> >>>> >> we would like to avoid is having flags used
by one specific tool available
>> >>>> >> on other tools. To address this issue, we
would like to migrate all tools
>> >>>> >> to use `OptTable` which doesn't have this
issue and has been the general
>> >>>> >> direction most tools have been already moving
into.
>> >>>> >>
>> >>>> >> A second issue would be resolving symlinks.
For example, llvm-objcopy will
>> >>>> >> check argv[0] and behave as llvm-strip (ie.
use the right flags +
>> >>>> >> configuration) if it is called via a symlink
that “looks like” a strip
>> >>>> >> tool, but for all other cases it will run
under the default objcopy mode.
>> >>>> >> The “looks like” function is usually an `Is`
function copied in multiple
>> >>>> >> tools that is essentially a substring check:
so symlinks like `llvm-strip`,
>> >>>> >> strip.exe, and `gnu-llvm-strip-10` all result
in using the strip “mode”
>> >>>> >> while all other names use the objcopy mode.
To replicate the same behavior,
>> >>>> >> we will need to take great care in making
sure symlinks to the busybox tool
>> >>>> >> dispatch correctly to the appropriate llvm
tool, which might mean exposing
>> >>>> >> and merging these `Is` functions.
>> >>>> >>
>> >>>> >> Some open questions:
>> >>>> >> - People's initial thoughts/opinions?
>> >>>> >> - Are there existing tools in LLVM that
already do this?
>> >>>> >> - Other implementation details/global states
that we would also need to
>> >>>> >> account for?
>> >>>> >>
>> >>>> >> - Leonard
>> >>>> >>
>> >>>>
>> >>>> >_______________________________________________
>> >>>> >LLVM Developers mailing list
>> >>>> >llvm-dev at lists.llvm.org
>> >>>>
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>>>
>> >>>> _______________________________________________
>> >>>> LLVM Developers mailing list
>> >>>> llvm-dev at lists.llvm.org
>> >>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>>
>> >>> _______________________________________________
>> >>> LLVM Developers mailing list
>> >>> llvm-dev at lists.llvm.org
>> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>> --
>> 宋方睿
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
宋方睿