thr3ads.net - llvm dev - [llvm-dev] [RFC] Generating LLD reproducers on crashes [Apr 2021]

If this information is useful, please help other people find it:
Share via:

Fangrui Song via llvm-dev

2021-Apr-16 04:30 UTC

[llvm-dev] [RFC] Generating LLD reproducers on crashes

On 2021-04-15, Manoj Gupta via llvm-dev wrote:>LLD reproducers is something we'd like to have in Chrome OS as well, see
>bug https://bugs.chromium.org/p/chromium/issues/detail?id=1134940 (no
>activity yet).
>Our plan is to create a shell wrapper and re-exec LLD if needed with
>--reproduce. Obviously, if LLD supports creating reproducers natively,
>that'd be great!
>
>-Manoj
The crash report can be easily implemented via a shell script, but is difficult
to implementat reliably in the process itself.  When a process crashes,
naturally not everything can work very robustly. The process wants to recover
some state and starts a .tar writer, collects every touched file and places
them in the .tar writer. There are many steps things can go afoul. I am
worrying about the robustness. Of course, this may be solved by a multiprocess
architecture, but I am not sure we want to pay the complexity in the LLD
entrypoint itself.

(Crashing LLD is not the idea I hear a lot. For some groups it has been very
stable.
The crashes are more frequently from some optimizations triggered by
llvm/lib/LTO.
The nature of the crashes is useful, if Fuchsia/ChromeOS folks would like to
provide.)

On the other hand, this task seems to require a fair amount of customization to
me.  First we have the tarball size problem. Then say there is a common crash
and 100 links of a similar kind crash at the same time, do we write 100
tarballs?  In a controlled environment, for example when there is some
deduplicater or throttling this may be feasible. The output filename may want
customization as well, and different groups may have different opinions.  It
feels to me that a script, whether or not LLD has the built-in crash reporting
feature, is indispensable. Then the built-in C++ crash reporter code in LLD
does not convince me.


>On Thu, Apr 15, 2021 at 11:23 AM David Blaikie via llvm-dev <
>llvm-dev at lists.llvm.org> wrote:
>
>> On Thu, Apr 15, 2021 at 1:37 AM Petr Hosek via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> >
>> > lld crashes are more rare, but they do happen. For example, we see
lld
>> segfaulting occasionally on our bots. I'd like to fix it, but I
don't know
>> how to reproduce this issue because we never managed to reproduce it
>> locally. This is primarily where the motivation for this feature came
from.
>> In the case of Clang, we already configure our build to generate
>> reproducers in a dedicated directory and at the end of the build we
upload
>> its content to a dedicated (short lived) storage bucket. We would like
to
>> do the same with lld and if this feature existed, we would use it in
our
>> build.
>> >
>> > The size of the reproducers is not really an issue; even if they
are a
>> few gigabytes, they're still dwarfed by the size of the debug info,
at
>> least in our build.
>> >
>> > Passing -Wl,--reproduce is something a compiler engineer can do
when
>> debugging an issue locally, but it's not something a bot can do.
Even most
>> developers on our team wouldn't know how to do it which is why the
>> automatic crash reproducer generation in Clang is so valuable, all that
>> developers need to do is to follow the instructions without having to
>> modify the build and we've had great success with it in the case of
Clang.
>>
>> Probably would help (if this isn't done already) this part at least
>> (ie: users who don't have this newly proposed feature enabled) if
>> lld's crash reporter printed the command line to run with the extra
>> flag "to reproduce this run <this command>" for
discoverability?
>>
>> (not to derail the primary discussion on this thread, which I don't
>> have much opinion on)
>>
>> > I'm leaning towards the second option, that is implementing
this feature
>> directly in lld. The reason is that we most often see lld crashes when
>> linking Rust code. If we implemented this feature in the Clang driver,
we
>> would also need to do the same inside the Rust driver (and any other
>> compiler driver that supports lld). If we implement it in lld, we only
need
>> to do it once, so it's more universal.
>> >
>> > On Wed, Apr 14, 2021 at 3:40 PM Fāng-ruì Sòng via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> On Wed, Apr 14, 2021 at 3:27 PM Haowei Wu <haowei at
google.com> wrote:
>> >> >
>> >> > > I am skeptical that users will want to have this
behavior by
>> default.
>> >> > > If this behavior is guarded by an option, it might
be fine.
>> >> >
>> >> > That's a good point. If the reproducer will be more
than a few
>> hundreds MiBs, it is definitely not suitable to be enabled by default.
I
>> agree it's better to be guarded by an option flag such as
>> `--gen-lld-crash-reproducer`.
>> >> >
>> >> > On Wed, Apr 14, 2021 at 2:40 PM Fangrui Song <maskray
at google.com>
>> wrote:
>> >> >>
>> >> >>
>> >> >> On 2021-04-14, Haowei Wu via llvm-dev wrote:
>> >> >> >*Background / Motivation*
>> >> >> >
>> >> >> >Both clang and lld have the ability to generate a
reproducer (an
>> archive
>> >> >> >with input files and invoker script to reproduce
the clang/lld
>> build).
>> >> >> >While clang will generate a reproducer archive
when a crash
>> happens, lld
>> >> >> >only generates a reproducer when
'--reproduce' flag is explicitly
>> set (this
>> >> >> >is equivalent to Clang's -gen-reproducer
flag). This is not very
>> helpful
>> >> >> >for debugging lld bugs, particularly when the
crash happens in
>> building big
>> >> >> >projects, since it will be unrealistic to set
reproducer flags to
>> generate
>> >> >> >reproducer archives for every lld invocation.
This design also
>> causes
>> >> >> >troubles when the crash happens on bots only, as
in most cases,
>> developers
>> >> >> >do not have access to the file system of these
bots. It would be
>> great to
>> >> >> >improve the lld reproducer generation for easier
debugging in these
>> >> >> >scenarios.
>> >> >> >
>> >> >> >*Proposal*
>> >> >> >
>> >> >> >Given the use cases and status of clang and lld.
I think there are 2
>> >> >> >possible solutions.
>> >> >> >
>> >> >> >*Extend Clang driver*
>> >> >> >In most cases, lld is invoked by the clang driver
instead of being
>> invoked
>> >> >> >by the build system directly. Therefore, the
clang driver can be
>> changed to
>> >> >> >re-invoke lld with '--reproduce' flags
when it detects the lld
>> subprocess
>> >> >> >is crashed.
>> >> >> >
>> >> >> >Advantages:
>> >> >> >    * It probably does not require any changes to
the lld and might
>> be
>> >> >> >easier than handling the crash directly in lld.
>> >> >> >
>> >> >> >Disadvantages:
>> >> >> >    * In case when there is a racing condition in
the build system,
>> the
>> >> >> >input files might have changed between 1st lld
crash and 2nd lld
>> rerun with
>> >> >> >'--reproduce' flag. In this case, the
generated lld reproducer
>> archive
>> >> >> >might not be able to trigger a crash, makes it
less useful.
>> >> >> >
>> >> >> >*Improve lld reproducer*
>> >> >> >Another way would be to make lld generate a
reproducer archive when
>> it
>> >> >> >crashes, just like what clang is doing.
>> >> >> >
>> >> >> >Advantages:
>> >> >> >    * It will work no matter if lld is invoked
from Clang or from
>> the build
>> >> >> >system.
>> >> >> >    * It will catch the input file in case the
crash is caused by
>> build
>> >> >> >races.
>> >> >> >
>> >> >> >Disadvantages:
>> >> >> >    * It might need a lot of work if lld does not
already have a
>> >> >> >sophisticated crash handler. It might still need
some plumbing
>> changes in
>> >> >> >clang driver so lld can honor the
'-fcrash-diagnostic-dir' flag.
>> >> >> >
>> >> >> >*Comments?*
>> >> >> >Which approach do you prefer? Feel free to share
your opinions.
>> >> >>
>> >> >> There is a resource difference between clang
-gen-reproducer /
>> >> >> environment variable
"FORCE_CLANG_DIAGNOSTICS_CRASH" and ld.lld
>> --reproduce.
>> >> >>
>> >> >> clang -gen-reproducer produces a source file and a
.sh file for one
>> >> >> single translation unit, the space consumption is
low.
>> >> >> ld.lld --reproduce can potentially pack a large list
of files, which
>> may
>> >> >> take hundreds of megabytes or several gigabytes.
>> >> >>
>> >> >> I am skeptical that users will want to have this
behavior by default.
>> >> >> If this behavior is guarded by an option, it might be
fine.
>> >>
>> >> I'll retract my words about an option. This behavior looks
like it
>> >> needs a fair bit of customization and is build system
dependent.
>> >> You can replace the proposed option with a shell script
wrapper, which
>> >> is more convenient than implementing the restartable action in
the
>> >> clang driver.
>> >> When dealing with linker problems, (I doubt there are many
nowadays;
>> >> when there are problems, mostly are LTO problems), I will
usually
>> >> change compiler/linker options a bit.
>> >> If you do this, you may only specify the proposed option when
all the
>> >> stuff has been done, but then it is only a very small extra
step to
>> >> invoke the link again with -Wl,--reproduce.
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

pawel k. via llvm-dev

2021-Apr-16 05:48 UTC

head link

[llvm-dev] [RFC] Generating LLD reproducers on crashes

Not sure its applicable here but it may give guidelines. Not sure how to do
in lld/llvm context with possibly win linux diff etc.

I like crash handle mechanisms on ubuntu and android. Dont like their
nonexistence on windows. Ubuntu and android are nice. We should aim for
wholellvm being nice.

How it works on android:
Data is collected
User is notified what will be sent and to whom
User has control as of whether to send it and can add msg to the report
possibly how when etc it happened. What happened what was expected. App can
say in editbox header what should be given in the msg.

Similarly on ubuntu mybe minus customized msg but again user has control
whether to send it and what will be sent.

Also bonus idea is working on minimizing the tarball in extra user time or
within except handler etc.

Hope it helps.

Best regards,
Pk

pt., 16.04.2021, 06:30 użytkownik Fangrui Song via llvm-dev <
llvm-dev at lists.llvm.org> napisał:
> On 2021-04-15, Manoj Gupta via llvm-dev wrote:
> >LLD reproducers is something we'd like to have in Chrome OS as
well, see
> >bug https://bugs.chromium.org/p/chromium/issues/detail?id=1134940 (no
> >activity yet).
> >Our plan is to create a shell wrapper and re-exec LLD if needed with
> >--reproduce. Obviously, if LLD supports creating reproducers natively,
> >that'd be great!
> >
> >-Manoj
>
> The crash report can be easily implemented via a shell script, but is
> difficult
> to implementat reliably in the process itself.  When a process crashes,
> naturally not everything can work very robustly. The process wants to
> recover
> some state and starts a .tar writer, collects every touched file and places
> them in the .tar writer. There are many steps things can go afoul. I am
> worrying about the robustness. Of course, this may be solved by a
> multiprocess
> architecture, but I am not sure we want to pay the complexity in the LLD
> entrypoint itself.
>
> (Crashing LLD is not the idea I hear a lot. For some groups it has been
> very stable.
> The crashes are more frequently from some optimizations triggered by
> llvm/lib/LTO.
> The nature of the crashes is useful, if Fuchsia/ChromeOS folks would like
> to provide.)
>
> On the other hand, this task seems to require a fair amount of
> customization to
> me.  First we have the tarball size problem. Then say there is a common
> crash
> and 100 links of a similar kind crash at the same time, do we write 100
> tarballs?  In a controlled environment, for example when there is some
> deduplicater or throttling this may be feasible. The output filename may
> want
> customization as well, and different groups may have different opinions.
> It
> feels to me that a script, whether or not LLD has the built-in crash
> reporting
> feature, is indispensable. Then the built-in C++ crash reporter code in LLD
> does not convince me.
>
>
>
> >On Thu, Apr 15, 2021 at 11:23 AM David Blaikie via llvm-dev <
> >llvm-dev at lists.llvm.org> wrote:
> >
> >> On Thu, Apr 15, 2021 at 1:37 AM Petr Hosek via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >> >
> >> > lld crashes are more rare, but they do happen. For example,
we see lld
> >> segfaulting occasionally on our bots. I'd like to fix it, but
I don't
> know
> >> how to reproduce this issue because we never managed to reproduce
it
> >> locally. This is primarily where the motivation for this feature
came
> from.
> >> In the case of Clang, we already configure our build to generate
> >> reproducers in a dedicated directory and at the end of the build
we
> upload
> >> its content to a dedicated (short lived) storage bucket. We would
like
> to
> >> do the same with lld and if this feature existed, we would use it
in our
> >> build.
> >> >
> >> > The size of the reproducers is not really an issue; even if
they are a
> >> few gigabytes, they're still dwarfed by the size of the debug
info, at
> >> least in our build.
> >> >
> >> > Passing -Wl,--reproduce is something a compiler engineer can
do when
> >> debugging an issue locally, but it's not something a bot can
do. Even
> most
> >> developers on our team wouldn't know how to do it which is why
the
> >> automatic crash reproducer generation in Clang is so valuable, all
that
> >> developers need to do is to follow the instructions without having
to
> >> modify the build and we've had great success with it in the
case of
> Clang.
> >>
> >> Probably would help (if this isn't done already) this part at
least
> >> (ie: users who don't have this newly proposed feature enabled)
if
> >> lld's crash reporter printed the command line to run with the
extra
> >> flag "to reproduce this run <this command>" for
discoverability?
> >>
> >> (not to derail the primary discussion on this thread, which I
don't
> >> have much opinion on)
> >>
> >> > I'm leaning towards the second option, that is
implementing this
> feature
> >> directly in lld. The reason is that we most often see lld crashes
when
> >> linking Rust code. If we implemented this feature in the Clang
driver,
> we
> >> would also need to do the same inside the Rust driver (and any
other
> >> compiler driver that supports lld). If we implement it in lld, we
only
> need
> >> to do it once, so it's more universal.
> >> >
> >> > On Wed, Apr 14, 2021 at 3:40 PM Fāng-ruì Sòng via llvm-dev
<
> >> llvm-dev at lists.llvm.org> wrote:
> >> >>
> >> >> On Wed, Apr 14, 2021 at 3:27 PM Haowei Wu <haowei at
google.com> wrote:
> >> >> >
> >> >> > > I am skeptical that users will want to have
this behavior by
> >> default.
> >> >> > > If this behavior is guarded by an option, it
might be fine.
> >> >> >
> >> >> > That's a good point. If the reproducer will be
more than a few
> >> hundreds MiBs, it is definitely not suitable to be enabled by
default. I
> >> agree it's better to be guarded by an option flag such as
> >> `--gen-lld-crash-reproducer`.
> >> >> >
> >> >> > On Wed, Apr 14, 2021 at 2:40 PM Fangrui Song
<maskray at google.com>
> >> wrote:
> >> >> >>
> >> >> >>
> >> >> >> On 2021-04-14, Haowei Wu via llvm-dev wrote:
> >> >> >> >*Background / Motivation*
> >> >> >> >
> >> >> >> >Both clang and lld have the ability to
generate a reproducer (an
> >> archive
> >> >> >> >with input files and invoker script to
reproduce the clang/lld
> >> build).
> >> >> >> >While clang will generate a reproducer
archive when a crash
> >> happens, lld
> >> >> >> >only generates a reproducer when
'--reproduce' flag is explicitly
> >> set (this
> >> >> >> >is equivalent to Clang's -gen-reproducer
flag). This is not very
> >> helpful
> >> >> >> >for debugging lld bugs, particularly when
the crash happens in
> >> building big
> >> >> >> >projects, since it will be unrealistic to
set reproducer flags to
> >> generate
> >> >> >> >reproducer archives for every lld
invocation. This design also
> >> causes
> >> >> >> >troubles when the crash happens on bots
only, as in most cases,
> >> developers
> >> >> >> >do not have access to the file system of
these bots. It would be
> >> great to
> >> >> >> >improve the lld reproducer generation for
easier debugging in
> these
> >> >> >> >scenarios.
> >> >> >> >
> >> >> >> >*Proposal*
> >> >> >> >
> >> >> >> >Given the use cases and status of clang and
lld. I think there
> are 2
> >> >> >> >possible solutions.
> >> >> >> >
> >> >> >> >*Extend Clang driver*
> >> >> >> >In most cases, lld is invoked by the clang
driver instead of
> being
> >> invoked
> >> >> >> >by the build system directly. Therefore, the
clang driver can be
> >> changed to
> >> >> >> >re-invoke lld with '--reproduce'
flags when it detects the lld
> >> subprocess
> >> >> >> >is crashed.
> >> >> >> >
> >> >> >> >Advantages:
> >> >> >> >    * It probably does not require any
changes to the lld and
> might
> >> be
> >> >> >> >easier than handling the crash directly in
lld.
> >> >> >> >
> >> >> >> >Disadvantages:
> >> >> >> >    * In case when there is a racing
condition in the build
> system,
> >> the
> >> >> >> >input files might have changed between 1st
lld crash and 2nd lld
> >> rerun with
> >> >> >> >'--reproduce' flag. In this case,
the generated lld reproducer
> >> archive
> >> >> >> >might not be able to trigger a crash, makes
it less useful.
> >> >> >> >
> >> >> >> >*Improve lld reproducer*
> >> >> >> >Another way would be to make lld generate a
reproducer archive
> when
> >> it
> >> >> >> >crashes, just like what clang is doing.
> >> >> >> >
> >> >> >> >Advantages:
> >> >> >> >    * It will work no matter if lld is
invoked from Clang or from
> >> the build
> >> >> >> >system.
> >> >> >> >    * It will catch the input file in case
the crash is caused by
> >> build
> >> >> >> >races.
> >> >> >> >
> >> >> >> >Disadvantages:
> >> >> >> >    * It might need a lot of work if lld
does not already have a
> >> >> >> >sophisticated crash handler. It might still
need some plumbing
> >> changes in
> >> >> >> >clang driver so lld can honor the
'-fcrash-diagnostic-dir' flag.
> >> >> >> >
> >> >> >> >*Comments?*
> >> >> >> >Which approach do you prefer? Feel free to
share your opinions.
> >> >> >>
> >> >> >> There is a resource difference between clang
-gen-reproducer /
> >> >> >> environment variable
"FORCE_CLANG_DIAGNOSTICS_CRASH" and ld.lld
> >> --reproduce.
> >> >> >>
> >> >> >> clang -gen-reproducer produces a source file and
a .sh file for
> one
> >> >> >> single translation unit, the space consumption
is low.
> >> >> >> ld.lld --reproduce can potentially pack a large
list of files,
> which
> >> may
> >> >> >> take hundreds of megabytes or several gigabytes.
> >> >> >>
> >> >> >> I am skeptical that users will want to have this
behavior by
> default.
> >> >> >> If this behavior is guarded by an option, it
might be fine.
> >> >>
> >> >> I'll retract my words about an option. This behavior
looks like it
> >> >> needs a fair bit of customization and is build system
dependent.
> >> >> You can replace the proposed option with a shell script
wrapper,
> which
> >> >> is more convenient than implementing the restartable
action in the
> >> >> clang driver.
> >> >> When dealing with linker problems, (I doubt there are
many nowadays;
> >> >> when there are problems, mostly are LTO problems), I will
usually
> >> >> change compiler/linker options a bit.
> >> >> If you do this, you may only specify the proposed option
when all the
> >> >> stuff has been done, but then it is only a very small
extra step to
> >> >> invoke the link again with -Wl,--reproduce.
> >> >> _______________________________________________
> >> >> LLVM Developers mailing list
> >> >> llvm-dev at lists.llvm.org
> >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > llvm-dev at lists.llvm.org
> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
>
> >_______________________________________________
> >LLVM Developers mailing list
> >llvm-dev at lists.llvm.org
> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210416/698246fb/attachment.html>

pawel k. via llvm-dev

2021-Apr-16 05:59 UTC

head link

[llvm-dev] [RFC] Generating LLD reproducers on crashes

Rant mode on by old timer but noob in llvm world. Plz delete if that is
crap.

Another hint may be checking how chrome or firefox do it as they seem nice
in this area too and trying to reuse the mechanism if license allows and it
covers requiremets. Bonus they span hosts we may need.

As on architecture of solution:
As an old timer id guess wrapper script or dirtying driver isnt perfect. At
worst wra it in exceptcatch encloser process and run in such sandboxy
container. But at best try to handle it within specifuc process.

Best regards,
Pk

pt., 16.04.2021, 06:30 użytkownik Fangrui Song via llvm-dev <
llvm-dev at lists.llvm.org> napisał:
> On 2021-04-15, Manoj Gupta via llvm-dev wrote:
> >LLD reproducers is something we'd like to have in Chrome OS as
well, see
> >bug https://bugs.chromium.org/p/chromium/issues/detail?id=1134940 (no
> >activity yet).
> >Our plan is to create a shell wrapper and re-exec LLD if needed with
> >--reproduce. Obviously, if LLD supports creating reproducers natively,
> >that'd be great!
> >
> >-Manoj
>
> The crash report can be easily implemented via a shell script, but is
> difficult
> to implementat reliably in the process itself.  When a process crashes,
> naturally not everything can work very robustly. The process wants to
> recover
> some state and starts a .tar writer, collects every touched file and places
> them in the .tar writer. There are many steps things can go afoul. I am
> worrying about the robustness. Of course, this may be solved by a
> multiprocess
> architecture, but I am not sure we want to pay the complexity in the LLD
> entrypoint itself.
>
> (Crashing LLD is not the idea I hear a lot. For some groups it has been
> very stable.
> The crashes are more frequently from some optimizations triggered by
> llvm/lib/LTO.
> The nature of the crashes is useful, if Fuchsia/ChromeOS folks would like
> to provide.)
>
> On the other hand, this task seems to require a fair amount of
> customization to
> me.  First we have the tarball size problem. Then say there is a common
> crash
> and 100 links of a similar kind crash at the same time, do we write 100
> tarballs?  In a controlled environment, for example when there is some
> deduplicater or throttling this may be feasible. The output filename may
> want
> customization as well, and different groups may have different opinions.
> It
> feels to me that a script, whether or not LLD has the built-in crash
> reporting
> feature, is indispensable. Then the built-in C++ crash reporter code in LLD
> does not convince me.
>
>
>
> >On Thu, Apr 15, 2021 at 11:23 AM David Blaikie via llvm-dev <
> >llvm-dev at lists.llvm.org> wrote:
> >
> >> On Thu, Apr 15, 2021 at 1:37 AM Petr Hosek via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >> >
> >> > lld crashes are more rare, but they do happen. For example,
we see lld
> >> segfaulting occasionally on our bots. I'd like to fix it, but
I don't
> know
> >> how to reproduce this issue because we never managed to reproduce
it
> >> locally. This is primarily where the motivation for this feature
came
> from.
> >> In the case of Clang, we already configure our build to generate
> >> reproducers in a dedicated directory and at the end of the build
we
> upload
> >> its content to a dedicated (short lived) storage bucket. We would
like
> to
> >> do the same with lld and if this feature existed, we would use it
in our
> >> build.
> >> >
> >> > The size of the reproducers is not really an issue; even if
they are a
> >> few gigabytes, they're still dwarfed by the size of the debug
info, at
> >> least in our build.
> >> >
> >> > Passing -Wl,--reproduce is something a compiler engineer can
do when
> >> debugging an issue locally, but it's not something a bot can
do. Even
> most
> >> developers on our team wouldn't know how to do it which is why
the
> >> automatic crash reproducer generation in Clang is so valuable, all
that
> >> developers need to do is to follow the instructions without having
to
> >> modify the build and we've had great success with it in the
case of
> Clang.
> >>
> >> Probably would help (if this isn't done already) this part at
least
> >> (ie: users who don't have this newly proposed feature enabled)
if
> >> lld's crash reporter printed the command line to run with the
extra
> >> flag "to reproduce this run <this command>" for
discoverability?
> >>
> >> (not to derail the primary discussion on this thread, which I
don't
> >> have much opinion on)
> >>
> >> > I'm leaning towards the second option, that is
implementing this
> feature
> >> directly in lld. The reason is that we most often see lld crashes
when
> >> linking Rust code. If we implemented this feature in the Clang
driver,
> we
> >> would also need to do the same inside the Rust driver (and any
other
> >> compiler driver that supports lld). If we implement it in lld, we
only
> need
> >> to do it once, so it's more universal.
> >> >
> >> > On Wed, Apr 14, 2021 at 3:40 PM Fāng-ruì Sòng via llvm-dev
<
> >> llvm-dev at lists.llvm.org> wrote:
> >> >>
> >> >> On Wed, Apr 14, 2021 at 3:27 PM Haowei Wu <haowei at
google.com> wrote:
> >> >> >
> >> >> > > I am skeptical that users will want to have
this behavior by
> >> default.
> >> >> > > If this behavior is guarded by an option, it
might be fine.
> >> >> >
> >> >> > That's a good point. If the reproducer will be
more than a few
> >> hundreds MiBs, it is definitely not suitable to be enabled by
default. I
> >> agree it's better to be guarded by an option flag such as
> >> `--gen-lld-crash-reproducer`.
> >> >> >
> >> >> > On Wed, Apr 14, 2021 at 2:40 PM Fangrui Song
<maskray at google.com>
> >> wrote:
> >> >> >>
> >> >> >>
> >> >> >> On 2021-04-14, Haowei Wu via llvm-dev wrote:
> >> >> >> >*Background / Motivation*
> >> >> >> >
> >> >> >> >Both clang and lld have the ability to
generate a reproducer (an
> >> archive
> >> >> >> >with input files and invoker script to
reproduce the clang/lld
> >> build).
> >> >> >> >While clang will generate a reproducer
archive when a crash
> >> happens, lld
> >> >> >> >only generates a reproducer when
'--reproduce' flag is explicitly
> >> set (this
> >> >> >> >is equivalent to Clang's -gen-reproducer
flag). This is not very
> >> helpful
> >> >> >> >for debugging lld bugs, particularly when
the crash happens in
> >> building big
> >> >> >> >projects, since it will be unrealistic to
set reproducer flags to
> >> generate
> >> >> >> >reproducer archives for every lld
invocation. This design also
> >> causes
> >> >> >> >troubles when the crash happens on bots
only, as in most cases,
> >> developers
> >> >> >> >do not have access to the file system of
these bots. It would be
> >> great to
> >> >> >> >improve the lld reproducer generation for
easier debugging in
> these
> >> >> >> >scenarios.
> >> >> >> >
> >> >> >> >*Proposal*
> >> >> >> >
> >> >> >> >Given the use cases and status of clang and
lld. I think there
> are 2
> >> >> >> >possible solutions.
> >> >> >> >
> >> >> >> >*Extend Clang driver*
> >> >> >> >In most cases, lld is invoked by the clang
driver instead of
> being
> >> invoked
> >> >> >> >by the build system directly. Therefore, the
clang driver can be
> >> changed to
> >> >> >> >re-invoke lld with '--reproduce'
flags when it detects the lld
> >> subprocess
> >> >> >> >is crashed.
> >> >> >> >
> >> >> >> >Advantages:
> >> >> >> >    * It probably does not require any
changes to the lld and
> might
> >> be
> >> >> >> >easier than handling the crash directly in
lld.
> >> >> >> >
> >> >> >> >Disadvantages:
> >> >> >> >    * In case when there is a racing
condition in the build
> system,
> >> the
> >> >> >> >input files might have changed between 1st
lld crash and 2nd lld
> >> rerun with
> >> >> >> >'--reproduce' flag. In this case,
the generated lld reproducer
> >> archive
> >> >> >> >might not be able to trigger a crash, makes
it less useful.
> >> >> >> >
> >> >> >> >*Improve lld reproducer*
> >> >> >> >Another way would be to make lld generate a
reproducer archive
> when
> >> it
> >> >> >> >crashes, just like what clang is doing.
> >> >> >> >
> >> >> >> >Advantages:
> >> >> >> >    * It will work no matter if lld is
invoked from Clang or from
> >> the build
> >> >> >> >system.
> >> >> >> >    * It will catch the input file in case
the crash is caused by
> >> build
> >> >> >> >races.
> >> >> >> >
> >> >> >> >Disadvantages:
> >> >> >> >    * It might need a lot of work if lld
does not already have a
> >> >> >> >sophisticated crash handler. It might still
need some plumbing
> >> changes in
> >> >> >> >clang driver so lld can honor the
'-fcrash-diagnostic-dir' flag.
> >> >> >> >
> >> >> >> >*Comments?*
> >> >> >> >Which approach do you prefer? Feel free to
share your opinions.
> >> >> >>
> >> >> >> There is a resource difference between clang
-gen-reproducer /
> >> >> >> environment variable
"FORCE_CLANG_DIAGNOSTICS_CRASH" and ld.lld
> >> --reproduce.
> >> >> >>
> >> >> >> clang -gen-reproducer produces a source file and
a .sh file for
> one
> >> >> >> single translation unit, the space consumption
is low.
> >> >> >> ld.lld --reproduce can potentially pack a large
list of files,
> which
> >> may
> >> >> >> take hundreds of megabytes or several gigabytes.
> >> >> >>
> >> >> >> I am skeptical that users will want to have this
behavior by
> default.
> >> >> >> If this behavior is guarded by an option, it
might be fine.
> >> >>
> >> >> I'll retract my words about an option. This behavior
looks like it
> >> >> needs a fair bit of customization and is build system
dependent.
> >> >> You can replace the proposed option with a shell script
wrapper,
> which
> >> >> is more convenient than implementing the restartable
action in the
> >> >> clang driver.
> >> >> When dealing with linker problems, (I doubt there are
many nowadays;
> >> >> when there are problems, mostly are LTO problems), I will
usually
> >> >> change compiler/linker options a bit.
> >> >> If you do this, you may only specify the proposed option
when all the
> >> >> stuff has been done, but then it is only a very small
extra step to
> >> >> invoke the link again with -Wl,--reproduce.
> >> >> _______________________________________________
> >> >> LLVM Developers mailing list
> >> >> llvm-dev at lists.llvm.org
> >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > llvm-dev at lists.llvm.org
> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
>
> >_______________________________________________
> >LLVM Developers mailing list
> >llvm-dev at lists.llvm.org
> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210416/84a0c000/attachment.html>

Petr Hosek via llvm-dev

2021-Apr-16 06:14 UTC

head link

[llvm-dev] [RFC] Generating LLD reproducers on crashes

On Thu, Apr 15, 2021 at 9:30 PM Fangrui Song via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 2021-04-15, Manoj Gupta via llvm-dev wrote:
> >LLD reproducers is something we'd like to have in Chrome OS as
well, see
> >bug https://bugs.chromium.org/p/chromium/issues/detail?id=1134940 (no
> >activity yet).
> >Our plan is to create a shell wrapper and re-exec LLD if needed with
> >--reproduce. Obviously, if LLD supports creating reproducers natively,
> >that'd be great!
> >
> >-Manoj
>
> The crash report can be easily implemented via a shell script, but is
> difficult
> to implementat reliably in the process itself.  When a process crashes,
> naturally not everything can work very robustly. The process wants to
> recover
> some state and starts a .tar writer, collects every touched file and places
> them in the .tar writer. There are many steps things can go afoul. I am
> worrying about the robustness. Of course, this may be solved by a
> multiprocess
> architecture, but I am not sure we want to pay the complexity in the LLD
> entrypoint itself.
>
I'm hoping that we could reuse llvm::CrashRecoveryContext just like Clang
does without needing multi-process architecture. Furthermore, we might be
able to extract some of the common infrastructure from Clang to LLVM and
then use it in lld. Clang has already solved this so there's no need to
duplicate the effort.

> (Crashing LLD is not the idea I hear a lot. For some groups it has been
> very stable.
> The crashes are more frequently from some optimizations triggered by
> llvm/lib/LTO.
> The nature of the crashes is useful, if Fuchsia/ChromeOS folks would like
> to provide.)
>
> On the other hand, this task seems to require a fair amount of
> customization to
> me.  First we have the tarball size problem. Then say there is a common
> crash
> and 100 links of a similar kind crash at the same time, do we write 100
> tarballs?  In a controlled environment, for example when there is some
> deduplicater or throttling this may be feasible. The output filename may
> want
> customization as well, and different groups may have different opinions.
> It
> feels to me that a script, whether or not LLD has the built-in crash
> reporting
> feature, is indispensable. Then the built-in C++ crash reporter code in LLD
> does not convince me.
>
I cannot speak for others, but at least in our case we are usually limited
by the available memory and on the machines we have we can only run single
to low double digit of link jobs in parallel. We also never had this issue
with Clang and we usually run hundreds of parallel jobs.

Regarding the output name, I'd again take an inspiration from Clang and
simply use a temporary filename generated
by llvm::sys::fs::createTemporaryFile. That way we don't need to worry
about two failing link jobs trying to write to the same file. We never
needed the customization for Clang and I doubt we will for lld.

> >On Thu, Apr 15, 2021 at 11:23 AM David Blaikie via llvm-dev <
> >llvm-dev at lists.llvm.org> wrote:
> >
> >> On Thu, Apr 15, 2021 at 1:37 AM Petr Hosek via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >> >
> >> > lld crashes are more rare, but they do happen. For example,
we see lld
> >> segfaulting occasionally on our bots. I'd like to fix it, but
I don't
> know
> >> how to reproduce this issue because we never managed to reproduce
it
> >> locally. This is primarily where the motivation for this feature
came
> from.
> >> In the case of Clang, we already configure our build to generate
> >> reproducers in a dedicated directory and at the end of the build
we
> upload
> >> its content to a dedicated (short lived) storage bucket. We would
like
> to
> >> do the same with lld and if this feature existed, we would use it
in our
> >> build.
> >> >
> >> > The size of the reproducers is not really an issue; even if
they are a
> >> few gigabytes, they're still dwarfed by the size of the debug
info, at
> >> least in our build.
> >> >
> >> > Passing -Wl,--reproduce is something a compiler engineer can
do when
> >> debugging an issue locally, but it's not something a bot can
do. Even
> most
> >> developers on our team wouldn't know how to do it which is why
the
> >> automatic crash reproducer generation in Clang is so valuable, all
that
> >> developers need to do is to follow the instructions without having
to
> >> modify the build and we've had great success with it in the
case of
> Clang.
> >>
> >> Probably would help (if this isn't done already) this part at
least
> >> (ie: users who don't have this newly proposed feature enabled)
if
> >> lld's crash reporter printed the command line to run with the
extra
> >> flag "to reproduce this run <this command>" for
discoverability?
> >>
> >> (not to derail the primary discussion on this thread, which I
don't
> >> have much opinion on)
> >>
> >> > I'm leaning towards the second option, that is
implementing this
> feature
> >> directly in lld. The reason is that we most often see lld crashes
when
> >> linking Rust code. If we implemented this feature in the Clang
driver,
> we
> >> would also need to do the same inside the Rust driver (and any
other
> >> compiler driver that supports lld). If we implement it in lld, we
only
> need
> >> to do it once, so it's more universal.
> >> >
> >> > On Wed, Apr 14, 2021 at 3:40 PM Fāng-ruì Sòng via llvm-dev
<
> >> llvm-dev at lists.llvm.org> wrote:
> >> >>
> >> >> On Wed, Apr 14, 2021 at 3:27 PM Haowei Wu <haowei at
google.com> wrote:
> >> >> >
> >> >> > > I am skeptical that users will want to have
this behavior by
> >> default.
> >> >> > > If this behavior is guarded by an option, it
might be fine.
> >> >> >
> >> >> > That's a good point. If the reproducer will be
more than a few
> >> hundreds MiBs, it is definitely not suitable to be enabled by
default. I
> >> agree it's better to be guarded by an option flag such as
> >> `--gen-lld-crash-reproducer`.
> >> >> >
> >> >> > On Wed, Apr 14, 2021 at 2:40 PM Fangrui Song
<maskray at google.com>
> >> wrote:
> >> >> >>
> >> >> >>
> >> >> >> On 2021-04-14, Haowei Wu via llvm-dev wrote:
> >> >> >> >*Background / Motivation*
> >> >> >> >
> >> >> >> >Both clang and lld have the ability to
generate a reproducer (an
> >> archive
> >> >> >> >with input files and invoker script to
reproduce the clang/lld
> >> build).
> >> >> >> >While clang will generate a reproducer
archive when a crash
> >> happens, lld
> >> >> >> >only generates a reproducer when
'--reproduce' flag is explicitly
> >> set (this
> >> >> >> >is equivalent to Clang's -gen-reproducer
flag). This is not very
> >> helpful
> >> >> >> >for debugging lld bugs, particularly when
the crash happens in
> >> building big
> >> >> >> >projects, since it will be unrealistic to
set reproducer flags to
> >> generate
> >> >> >> >reproducer archives for every lld
invocation. This design also
> >> causes
> >> >> >> >troubles when the crash happens on bots
only, as in most cases,
> >> developers
> >> >> >> >do not have access to the file system of
these bots. It would be
> >> great to
> >> >> >> >improve the lld reproducer generation for
easier debugging in
> these
> >> >> >> >scenarios.
> >> >> >> >
> >> >> >> >*Proposal*
> >> >> >> >
> >> >> >> >Given the use cases and status of clang and
lld. I think there
> are 2
> >> >> >> >possible solutions.
> >> >> >> >
> >> >> >> >*Extend Clang driver*
> >> >> >> >In most cases, lld is invoked by the clang
driver instead of
> being
> >> invoked
> >> >> >> >by the build system directly. Therefore, the
clang driver can be
> >> changed to
> >> >> >> >re-invoke lld with '--reproduce'
flags when it detects the lld
> >> subprocess
> >> >> >> >is crashed.
> >> >> >> >
> >> >> >> >Advantages:
> >> >> >> >    * It probably does not require any
changes to the lld and
> might
> >> be
> >> >> >> >easier than handling the crash directly in
lld.
> >> >> >> >
> >> >> >> >Disadvantages:
> >> >> >> >    * In case when there is a racing
condition in the build
> system,
> >> the
> >> >> >> >input files might have changed between 1st
lld crash and 2nd lld
> >> rerun with
> >> >> >> >'--reproduce' flag. In this case,
the generated lld reproducer
> >> archive
> >> >> >> >might not be able to trigger a crash, makes
it less useful.
> >> >> >> >
> >> >> >> >*Improve lld reproducer*
> >> >> >> >Another way would be to make lld generate a
reproducer archive
> when
> >> it
> >> >> >> >crashes, just like what clang is doing.
> >> >> >> >
> >> >> >> >Advantages:
> >> >> >> >    * It will work no matter if lld is
invoked from Clang or from
> >> the build
> >> >> >> >system.
> >> >> >> >    * It will catch the input file in case
the crash is caused by
> >> build
> >> >> >> >races.
> >> >> >> >
> >> >> >> >Disadvantages:
> >> >> >> >    * It might need a lot of work if lld
does not already have a
> >> >> >> >sophisticated crash handler. It might still
need some plumbing
> >> changes in
> >> >> >> >clang driver so lld can honor the
'-fcrash-diagnostic-dir' flag.
> >> >> >> >
> >> >> >> >*Comments?*
> >> >> >> >Which approach do you prefer? Feel free to
share your opinions.
> >> >> >>
> >> >> >> There is a resource difference between clang
-gen-reproducer /
> >> >> >> environment variable
"FORCE_CLANG_DIAGNOSTICS_CRASH" and ld.lld
> >> --reproduce.
> >> >> >>
> >> >> >> clang -gen-reproducer produces a source file and
a .sh file for
> one
> >> >> >> single translation unit, the space consumption
is low.
> >> >> >> ld.lld --reproduce can potentially pack a large
list of files,
> which
> >> may
> >> >> >> take hundreds of megabytes or several gigabytes.
> >> >> >>
> >> >> >> I am skeptical that users will want to have this
behavior by
> default.
> >> >> >> If this behavior is guarded by an option, it
might be fine.
> >> >>
> >> >> I'll retract my words about an option. This behavior
looks like it
> >> >> needs a fair bit of customization and is build system
dependent.
> >> >> You can replace the proposed option with a shell script
wrapper,
> which
> >> >> is more convenient than implementing the restartable
action in the
> >> >> clang driver.
> >> >> When dealing with linker problems, (I doubt there are
many nowadays;
> >> >> when there are problems, mostly are LTO problems), I will
usually
> >> >> change compiler/linker options a bit.
> >> >> If you do this, you may only specify the proposed option
when all the
> >> >> stuff has been done, but then it is only a very small
extra step to
> >> >> invoke the link again with -Wl,--reproduce.
> >> >> _______________________________________________
> >> >> LLVM Developers mailing list
> >> >> llvm-dev at lists.llvm.org
> >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > llvm-dev at lists.llvm.org
> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
>
> >_______________________________________________
> >LLVM Developers mailing list
> >llvm-dev at lists.llvm.org
> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210415/143b4504/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3996 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210415/143b4504/attachment-0001.bin>

llvm dev - Apr 2021 - [RFC] Generating LLD reproducers on crashes

[llvm-dev] [RFC] Generating LLD reproducers on crashes

[llvm-dev] [RFC] Generating LLD reproducers on crashes

[llvm-dev] [RFC] Generating LLD reproducers on crashes

[llvm-dev] [RFC] Generating LLD reproducers on crashes