thr3ads.net - llvm dev - [llvm-dev] LLD's default --build-id choice [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Rui Ueyama via llvm-dev

2016-Jun-01 22:21 UTC

[llvm-dev] LLD's default --build-id choice

Nico brought up this topic and made me think whether the current choice of
--build-id was the right one or not.

Currently, we compute a FNV1 hash for an entire resulting file and store it
to .note.gnu.build-id section. It's one of the slowest parts of the linker
because reading every byte takes time. IIRC, it usually takes about 10% of
total link time.

In the first place, I believe it was not a good decision to make GCC (and
therefore Clang) to pass --build-id option to the linker by default (it was
done in 2009 <https://lists.debian.org/debian-gcc/2009/07/msg00082.html>).
Build ID is sometimes useful, particularly when distributing linked objects
to users, but in most cases it is not needed. Spending 10% more time on
usual build-link-debug cycle is a waste of time. It should not have been
added that casually.

Anyways, the option is there and passed to the linker, so we have to create
and add a build ID if --build-id option is given (we could ignore the
option but that's probably very confusing.)

So here's my proposal.

 - Make --build-id=uuid as default for --build-id

--build-id=uuid sets build-id to a random unique value. It's very fast.
Instead, it breaks build reproducibility because every build has a unique
build ID. But if you want build reproducibility, you can explicitly pass
--build-id=sha1.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160601/1980618c/attachment.html>

Rafael Espíndola via llvm-dev

2016-Jun-01 22:32 UTC

head link

[llvm-dev] LLD's default --build-id choice

On 1 June 2016 at 15:21, Rui Ueyama via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Nico brought up this topic and made me think whether the current choice of
> --build-id was the right one or not.
>
> Currently, we compute a FNV1 hash for an entire resulting file and store it
> to .note.gnu.build-id section. It's one of the slowest parts of the
linker
> because reading every byte takes time. IIRC, it usually takes about 10% of
> total link time.
>
> In the first place, I believe it was not a good decision to make GCC (and
> therefore Clang) to pass --build-id option to the linker by default (it was
> done in 2009). Build ID is sometimes useful, particularly when distributing
> linked objects to users, but in most cases it is not needed. Spending 10%
> more time on usual build-link-debug cycle is a waste of time. It should not
> have been added that casually.
>
> Anyways, the option is there and passed to the linker, so we have to create
> and add a build ID if --build-id option is given (we could ignore the
option
> but that's probably very confusing.)
>
> So here's my proposal.
>
>  - Make --build-id=uuid as default for --build-id
>
> --build-id=uuid sets build-id to a random unique value. It's very fast.
> Instead, it breaks build reproducibility because every build has a unique
> build ID. But if you want build reproducibility, you can explicitly pass
> --build-id=sha1.
Please don't, reproducible builds are *really* important.

Note that you can disable build-id with -Wl,--build-id=none.

Maybe make the default an even simpler hash? Or hash just parts of the file?

I would also be open to just changing clang to not pass --build-id by default.

Cheers,
Rafael

Sean Silva via llvm-dev

2016-Jun-01 22:34 UTC

head link

[llvm-dev] LLD's default --build-id choice

Personally I don't like making things nondeterministic by default. I would
prefer to change clang to just not pass --build-id by default (or not by
default on -O0 or whatever).

-- Sean Silva

On Wed, Jun 1, 2016 at 3:21 PM, Rui Ueyama via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Nico brought up this topic and made me think whether the current choice of
> --build-id was the right one or not.
>
> Currently, we compute a FNV1 hash for an entire resulting file and store
> it to .note.gnu.build-id section. It's one of the slowest parts of the
> linker because reading every byte takes time. IIRC, it usually takes about
> 10% of total link time.
>
> In the first place, I believe it was not a good decision to make GCC (and
> therefore Clang) to pass --build-id option to the linker by default (it
> was done in 2009
> <https://lists.debian.org/debian-gcc/2009/07/msg00082.html>). Build
ID is
> sometimes useful, particularly when distributing linked objects to users,
> but in most cases it is not needed. Spending 10% more time on usual
> build-link-debug cycle is a waste of time. It should not have been added
> that casually.
>
> Anyways, the option is there and passed to the linker, so we have to
> create and add a build ID if --build-id option is given (we could ignore
> the option but that's probably very confusing.)
>
> So here's my proposal.
>
>  - Make --build-id=uuid as default for --build-id
>
> --build-id=uuid sets build-id to a random unique value. It's very fast.
> Instead, it breaks build reproducibility because every build has a unique
> build ID. But if you want build reproducibility, you can explicitly pass
> --build-id=sha1.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160601/b2f5a550/attachment.html>

Rui Ueyama via llvm-dev

2016-Jun-01 22:41 UTC

head link

[llvm-dev] LLD's default --build-id choice

On Wed, Jun 1, 2016 at 3:32 PM, Rafael Espíndola <rafael.espindola at
gmail.com> wrote:
> On 1 June 2016 at 15:21, Rui Ueyama via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > Nico brought up this topic and made me think whether the current
choice
> of
> > --build-id was the right one or not.
> >
> > Currently, we compute a FNV1 hash for an entire resulting file and
store
> it
> > to .note.gnu.build-id section. It's one of the slowest parts of
the
> linker
> > because reading every byte takes time. IIRC, it usually takes about
10%
> of
> > total link time.
> >
> > In the first place, I believe it was not a good decision to make GCC
(and
> > therefore Clang) to pass --build-id option to the linker by default
(it
> was
> > done in 2009). Build ID is sometimes useful, particularly when
> distributing
> > linked objects to users, but in most cases it is not needed. Spending
10%
> > more time on usual build-link-debug cycle is a waste of time. It
should
> not
> > have been added that casually.
> >
> > Anyways, the option is there and passed to the linker, so we have to
> create
> > and add a build ID if --build-id option is given (we could ignore the
> option
> > but that's probably very confusing.)
> >
> > So here's my proposal.
> >
> >  - Make --build-id=uuid as default for --build-id
> >
> > --build-id=uuid sets build-id to a random unique value. It's very
fast.
> > Instead, it breaks build reproducibility because every build has a
unique
> > build ID. But if you want build reproducibility, you can explicitly
pass
> > --build-id=sha1.
>
> Please don't, reproducible builds are *really* important.
>
> Note that you can disable build-id with -Wl,--build-id=none.
>
> Maybe make the default an even simpler hash? Or hash just parts of the
> file?
>
I think FNV1 is very fast hash function, so we cannot make it faster by
replacing it with some other hash function.

We could hash only some part of the file, say the first page of an
executable. In that way, there's a risk that two executables with the same
build ID are actually different if they have idential ELF headers, but is
it going to be a problem?

> I would also be open to just changing clang to not pass --build-id by
> default.
>
I'd be very happy if we do it.

>
> Cheers,
> Rafael
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160601/19d3fe76/attachment.html>

Joerg Sonnenberger via llvm-dev

2016-Jun-01 23:57 UTC

head link

[llvm-dev] LLD's default --build-id choice

On Wed, Jun 01, 2016 at 03:21:08PM -0700, Rui Ueyama via llvm-dev
wrote:> In the first place, I believe it was not a good decision to make GCC (and
> therefore Clang) to pass --build-id option to the linker by default (it was
> done in 2009
<https://lists.debian.org/debian-gcc/2009/07/msg00082.html>).
> Build ID is sometimes useful, particularly when distributing linked objects
> to users, but in most cases it is not needed. Spending 10% more time on
> usual build-link-debug cycle is a waste of time. It should not have been
> added that casually.
I fully agree on this (not passing it down by default automatically),
since it doesn't create a very useful key.
> --build-id=uuid sets build-id to a random unique value. It's very fast.
> Instead, it breaks build reproducibility because every build has a unique
> build ID. But if you want build reproducibility, you can explicitly pass
> --build-id=sha1.
I think this is worse than not doing anything at all. What about the
hash tree options, those can at least be computed piecewise and
concurrently?

Joerg

Rui Ueyama via llvm-dev

2016-Jun-02 00:03 UTC

head link

[llvm-dev] LLD's default --build-id choice

On Wed, Jun 1, 2016 at 4:57 PM, Joerg Sonnenberger via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Wed, Jun 01, 2016 at 03:21:08PM -0700, Rui Ueyama via llvm-dev wrote:
> > In the first place, I believe it was not a good decision to make GCC
(and
> > therefore Clang) to pass --build-id option to the linker by default
(it
> was
> > done in 2009
<https://lists.debian.org/debian-gcc/2009/07/msg00082.html
> >).
> > Build ID is sometimes useful, particularly when distributing linked
> objects
> > to users, but in most cases it is not needed. Spending 10% more time
on
> > usual build-link-debug cycle is a waste of time. It should not have
been
> > added that casually.
>
> I fully agree on this (not passing it down by default automatically),
> since it doesn't create a very useful key.
>
> > --build-id=uuid sets build-id to a random unique value. It's very
fast.
> > Instead, it breaks build reproducibility because every build has a
unique
> > build ID. But if you want build reproducibility, you can explicitly
pass
> > --build-id=sha1.
>
> I think this is worse than not doing anything at all. What about the
> hash tree options, those can at least be computed piecewise and
> concurrently?
>
We could but it only mitigate the issue. If the decision was wrong in the
first place, I want to fix it completely if it's not too late.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160601/7c37b40c/attachment.html>

Ed Maste via llvm-dev

2016-Jun-02 14:51 UTC

head link

[llvm-dev] LLD's default --build-id choice

On 1 June 2016 at 19:57, Joerg Sonnenberger via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> On Wed, Jun 01, 2016 at 03:21:08PM -0700, Rui Ueyama via llvm-dev wrote:
> > In the first place, I believe it was not a good decision to make GCC
(and
> > therefore Clang) to pass --build-id option to the linker by default
(it was
> > done in 2009
<https://lists.debian.org/debian-gcc/2009/07/msg00082.html>).
> > Build ID is sometimes useful, particularly when distributing linked
objects
> > to users, but in most cases it is not needed. Spending 10% more time
on
> > usual build-link-debug cycle is a waste of time. It should not have
been
> > added that casually.
>
> I fully agree on this (not passing it down by default automatically),
> since it doesn't create a very useful key.
I agree that it probably doesn't provide enough benefit in a usual
edit-compile-test development cycle to justify a slowdown.

build-id has two main use cases for debugging:
- make core dumps self-identifying (so you can just run "lldb
foo.core" and load foo automatically)
- avoid checksumming the whole file when loading standalone debug files
In the case of released, prebuilt software (i.e., distribution
packages) I'd argue that build-id does in fact create a useful key.

That said, rather than having the default built into the compiler, a
distribution or OS package build infrastructure should just pass in
the --build-id option through CFLAGS somehow.

llvm dev - Jun 2016 - LLD's default --build-id choice

[llvm-dev] LLD's default --build-id choice

[llvm-dev] LLD's default --build-id choice

[llvm-dev] LLD's default --build-id choice

[llvm-dev] LLD's default --build-id choice

[llvm-dev] LLD's default --build-id choice

[llvm-dev] LLD's default --build-id choice

[llvm-dev] LLD's default --build-id choice