On Thu, Jan 3, 2019 at 11:54 PM Kuba Mracek <mracek at apple.com> wrote:> > > > > On Jan 3, 2019, at 1:21 PM, David Greene via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org> writes: > > > >> What you're seeing is just the fact that lit is waiting on > >> subprocesses (select is waiting on the pipes i suspect). > > > > Right. Some digging revealed that it is waiting on > > getline_nohang.cc.tmp, a tsan test. > > > > I see that this test has been disabled for NetBSD, due to it sometimes > > failing. I'm seeing the same on Linux. > > > > How can we stabilize the sanitizer tests so that check-all can work > > reliably? If some sanitizer tests are so flaky, I should think they > > should be marked UNSUPPORTED. Who has the authority to make those > > determinations? > > Dmitry Vyukov does. CC'ing him.Are there any special repro instructions? I am running all tsan tests periodically on linux and none of them flakes.
Dmitry Vyukov <dvyukov at google.com> writes:> Are there any special repro instructions? I am running all tsan tests > periodically on linux and none of them flakes.I don't think I'm doing anything especially interesting. I wonder if lit parallelism has anything to do with it. I tend to run quite wide (32 or more). I'm on SLES 12.2, kernel 4.4.21-69-default, x86_64 in case it matters. I see this test hang pretty frequently. -David
On Fri, Jan 4, 2019 at 5:55 PM David Greene <dag at cray.com> wrote:> > Dmitry Vyukov <dvyukov at google.com> writes: > > > Are there any special repro instructions? I am running all tsan tests > > periodically on linux and none of them flakes. > > I don't think I'm doing anything especially interesting. I wonder if > lit parallelism has anything to do with it. I tend to run quite wide > (32 or more). > > I'm on SLES 12.2, kernel 4.4.21-69-default, x86_64 in case it matters. > I see this test hang pretty frequently.Hi David, The test is specifically a regression test for a deadlock: // Make sure TSan doesn't deadlock on a file stream lock at program shutdown. // See https://github.com/google/sanitizers/issues/454 So I wonder if it's not completely fixed. I am sure it does not reproduce on my machine: $ clang++ getline_nohang.cc -fsanitize=thread -O1 -g $ stress ./a.out 192 runs so far, 0 failures ... 17137 runs so far, 0 failures 17377 runs so far, 0 failures Could you please attach to the hanged process with gdb and do backtrace of all threads?