Nico brought up this topic and made me think whether the current choice of --build-id was the right one or not. Currently, we compute a FNV1 hash for an entire resulting file and store it to .note.gnu.build-id section. It's one of the slowest parts of the linker because reading every byte takes time. IIRC, it usually takes about 10% of total link time. In the first place, I believe it was not a good decision to make GCC (and therefore Clang) to pass --build-id option to the linker by default (it was done in 2009 <https://lists.debian.org/debian-gcc/2009/07/msg00082.html>). Build ID is sometimes useful, particularly when distributing linked objects to users, but in most cases it is not needed. Spending 10% more time on usual build-link-debug cycle is a waste of time. It should not have been added that casually. Anyways, the option is there and passed to the linker, so we have to create and add a build ID if --build-id option is given (we could ignore the option but that's probably very confusing.) So here's my proposal. - Make --build-id=uuid as default for --build-id --build-id=uuid sets build-id to a random unique value. It's very fast. Instead, it breaks build reproducibility because every build has a unique build ID. But if you want build reproducibility, you can explicitly pass --build-id=sha1. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160601/1980618c/attachment.html>
Rafael Espíndola via llvm-dev
2016-Jun-01 22:32 UTC
[llvm-dev] LLD's default --build-id choice
On 1 June 2016 at 15:21, Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Nico brought up this topic and made me think whether the current choice of > --build-id was the right one or not. > > Currently, we compute a FNV1 hash for an entire resulting file and store it > to .note.gnu.build-id section. It's one of the slowest parts of the linker > because reading every byte takes time. IIRC, it usually takes about 10% of > total link time. > > In the first place, I believe it was not a good decision to make GCC (and > therefore Clang) to pass --build-id option to the linker by default (it was > done in 2009). Build ID is sometimes useful, particularly when distributing > linked objects to users, but in most cases it is not needed. Spending 10% > more time on usual build-link-debug cycle is a waste of time. It should not > have been added that casually. > > Anyways, the option is there and passed to the linker, so we have to create > and add a build ID if --build-id option is given (we could ignore the option > but that's probably very confusing.) > > So here's my proposal. > > - Make --build-id=uuid as default for --build-id > > --build-id=uuid sets build-id to a random unique value. It's very fast. > Instead, it breaks build reproducibility because every build has a unique > build ID. But if you want build reproducibility, you can explicitly pass > --build-id=sha1.Please don't, reproducible builds are *really* important. Note that you can disable build-id with -Wl,--build-id=none. Maybe make the default an even simpler hash? Or hash just parts of the file? I would also be open to just changing clang to not pass --build-id by default. Cheers, Rafael
Personally I don't like making things nondeterministic by default. I would prefer to change clang to just not pass --build-id by default (or not by default on -O0 or whatever). -- Sean Silva On Wed, Jun 1, 2016 at 3:21 PM, Rui Ueyama via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Nico brought up this topic and made me think whether the current choice of > --build-id was the right one or not. > > Currently, we compute a FNV1 hash for an entire resulting file and store > it to .note.gnu.build-id section. It's one of the slowest parts of the > linker because reading every byte takes time. IIRC, it usually takes about > 10% of total link time. > > In the first place, I believe it was not a good decision to make GCC (and > therefore Clang) to pass --build-id option to the linker by default (it > was done in 2009 > <https://lists.debian.org/debian-gcc/2009/07/msg00082.html>). Build ID is > sometimes useful, particularly when distributing linked objects to users, > but in most cases it is not needed. Spending 10% more time on usual > build-link-debug cycle is a waste of time. It should not have been added > that casually. > > Anyways, the option is there and passed to the linker, so we have to > create and add a build ID if --build-id option is given (we could ignore > the option but that's probably very confusing.) > > So here's my proposal. > > - Make --build-id=uuid as default for --build-id > > --build-id=uuid sets build-id to a random unique value. It's very fast. > Instead, it breaks build reproducibility because every build has a unique > build ID. But if you want build reproducibility, you can explicitly pass > --build-id=sha1. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160601/b2f5a550/attachment.html>
On Wed, Jun 1, 2016 at 3:32 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:> On 1 June 2016 at 15:21, Rui Ueyama via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Nico brought up this topic and made me think whether the current choice > of > > --build-id was the right one or not. > > > > Currently, we compute a FNV1 hash for an entire resulting file and store > it > > to .note.gnu.build-id section. It's one of the slowest parts of the > linker > > because reading every byte takes time. IIRC, it usually takes about 10% > of > > total link time. > > > > In the first place, I believe it was not a good decision to make GCC (and > > therefore Clang) to pass --build-id option to the linker by default (it > was > > done in 2009). Build ID is sometimes useful, particularly when > distributing > > linked objects to users, but in most cases it is not needed. Spending 10% > > more time on usual build-link-debug cycle is a waste of time. It should > not > > have been added that casually. > > > > Anyways, the option is there and passed to the linker, so we have to > create > > and add a build ID if --build-id option is given (we could ignore the > option > > but that's probably very confusing.) > > > > So here's my proposal. > > > > - Make --build-id=uuid as default for --build-id > > > > --build-id=uuid sets build-id to a random unique value. It's very fast. > > Instead, it breaks build reproducibility because every build has a unique > > build ID. But if you want build reproducibility, you can explicitly pass > > --build-id=sha1. > > Please don't, reproducible builds are *really* important. > > Note that you can disable build-id with -Wl,--build-id=none. > > Maybe make the default an even simpler hash? Or hash just parts of the > file? >I think FNV1 is very fast hash function, so we cannot make it faster by replacing it with some other hash function. We could hash only some part of the file, say the first page of an executable. In that way, there's a risk that two executables with the same build ID are actually different if they have idential ELF headers, but is it going to be a problem?> I would also be open to just changing clang to not pass --build-id by > default. >I'd be very happy if we do it.> > Cheers, > Rafael >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160601/19d3fe76/attachment.html>
Joerg Sonnenberger via llvm-dev
2016-Jun-01 23:57 UTC
[llvm-dev] LLD's default --build-id choice
On Wed, Jun 01, 2016 at 03:21:08PM -0700, Rui Ueyama via llvm-dev wrote:> In the first place, I believe it was not a good decision to make GCC (and > therefore Clang) to pass --build-id option to the linker by default (it was > done in 2009 <https://lists.debian.org/debian-gcc/2009/07/msg00082.html>). > Build ID is sometimes useful, particularly when distributing linked objects > to users, but in most cases it is not needed. Spending 10% more time on > usual build-link-debug cycle is a waste of time. It should not have been > added that casually.I fully agree on this (not passing it down by default automatically), since it doesn't create a very useful key.> --build-id=uuid sets build-id to a random unique value. It's very fast. > Instead, it breaks build reproducibility because every build has a unique > build ID. But if you want build reproducibility, you can explicitly pass > --build-id=sha1.I think this is worse than not doing anything at all. What about the hash tree options, those can at least be computed piecewise and concurrently? Joerg
On Wed, Jun 1, 2016 at 4:57 PM, Joerg Sonnenberger via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Wed, Jun 01, 2016 at 03:21:08PM -0700, Rui Ueyama via llvm-dev wrote: > > In the first place, I believe it was not a good decision to make GCC (and > > therefore Clang) to pass --build-id option to the linker by default (it > was > > done in 2009 <https://lists.debian.org/debian-gcc/2009/07/msg00082.html > >). > > Build ID is sometimes useful, particularly when distributing linked > objects > > to users, but in most cases it is not needed. Spending 10% more time on > > usual build-link-debug cycle is a waste of time. It should not have been > > added that casually. > > I fully agree on this (not passing it down by default automatically), > since it doesn't create a very useful key. > > > --build-id=uuid sets build-id to a random unique value. It's very fast. > > Instead, it breaks build reproducibility because every build has a unique > > build ID. But if you want build reproducibility, you can explicitly pass > > --build-id=sha1. > > I think this is worse than not doing anything at all. What about the > hash tree options, those can at least be computed piecewise and > concurrently? >We could but it only mitigate the issue. If the decision was wrong in the first place, I want to fix it completely if it's not too late. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160601/7c37b40c/attachment.html>
On 1 June 2016 at 19:57, Joerg Sonnenberger via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > On Wed, Jun 01, 2016 at 03:21:08PM -0700, Rui Ueyama via llvm-dev wrote: > > In the first place, I believe it was not a good decision to make GCC (and > > therefore Clang) to pass --build-id option to the linker by default (it was > > done in 2009 <https://lists.debian.org/debian-gcc/2009/07/msg00082.html>). > > Build ID is sometimes useful, particularly when distributing linked objects > > to users, but in most cases it is not needed. Spending 10% more time on > > usual build-link-debug cycle is a waste of time. It should not have been > > added that casually. > > I fully agree on this (not passing it down by default automatically), > since it doesn't create a very useful key.I agree that it probably doesn't provide enough benefit in a usual edit-compile-test development cycle to justify a slowdown. build-id has two main use cases for debugging: - make core dumps self-identifying (so you can just run "lldb foo.core" and load foo automatically) - avoid checksumming the whole file when loading standalone debug files In the case of released, prebuilt software (i.e., distribution packages) I'd argue that build-id does in fact create a useful key. That said, rather than having the default built into the compiler, a distribution or OS package build infrastructure should just pass in the --build-id option through CFLAGS somehow.