thr3ads.net - llvm dev - [llvm-dev] [lit] check-all hanging [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Dmitry Vyukov via llvm-dev

2019-Jan-04 07:18 UTC

[llvm-dev] [lit] check-all hanging

On Thu, Jan 3, 2019 at 11:54 PM Kuba Mracek <mracek at apple.com>
wrote:>
>
>
> > On Jan 3, 2019, at 1:21 PM, David Greene via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> >
> > Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org>
writes:
> >
> >> What you're seeing is just the fact that lit is waiting on
> >> subprocesses (select is waiting on the pipes i suspect).
> >
> > Right.  Some digging revealed that it is waiting on
> > getline_nohang.cc.tmp, a tsan test.
> >
> > I see that this test has been disabled for NetBSD, due to it sometimes
> > failing.  I'm seeing the same on Linux.
> >
> > How can we stabilize the sanitizer tests so that check-all can work
> > reliably?  If some sanitizer tests are so flaky, I should think they
> > should be marked UNSUPPORTED.  Who has the authority to make those
> > determinations?
>
> Dmitry Vyukov does. CC'ing him.

Are there any special repro instructions? I am running all tsan tests
periodically on linux and none of them flakes.

David Greene via llvm-dev

2019-Jan-04 16:54 UTC

head link

[llvm-dev] [lit] check-all hanging

Dmitry Vyukov <dvyukov at google.com> writes:
> Are there any special repro instructions? I am running all tsan tests
> periodically on linux and none of them flakes.
I don't think I'm doing anything especially interesting.  I wonder if
lit parallelism has anything to do with it.  I tend to run quite wide
(32 or more).

I'm on SLES 12.2, kernel 4.4.21-69-default, x86_64 in case it matters.
I see this test hang pretty frequently.

                            -David

Dmitry Vyukov via llvm-dev

2019-Jan-04 17:16 UTC

head link

[llvm-dev] [lit] check-all hanging

On Fri, Jan 4, 2019 at 5:55 PM David Greene <dag at cray.com>
wrote:>
> Dmitry Vyukov <dvyukov at google.com> writes:
>
> > Are there any special repro instructions? I am running all tsan tests
> > periodically on linux and none of them flakes.
>
> I don't think I'm doing anything especially interesting.  I wonder
if
> lit parallelism has anything to do with it.  I tend to run quite wide
> (32 or more).
>
> I'm on SLES 12.2, kernel 4.4.21-69-default, x86_64 in case it matters.
> I see this test hang pretty frequently.
Hi David,

The test is specifically a regression test for a deadlock:

// Make sure TSan doesn't deadlock on a file stream lock at program
shutdown.
// See https://github.com/google/sanitizers/issues/454

So I wonder if it's not completely fixed.
I am sure it does not reproduce on my machine:

$ clang++ getline_nohang.cc -fsanitize=thread -O1 -g
$ stress ./a.out
192 runs so far, 0 failures
...
17137 runs so far, 0 failures
17377 runs so far, 0 failures

Could you please attach to the hanged process with gdb and do
backtrace of all threads?

llvm dev - Jan 2019 - [lit] check-all hanging

[llvm-dev] [lit] check-all hanging

[llvm-dev] [lit] check-all hanging

[llvm-dev] [lit] check-all hanging