David Blaikie
2015-May-13  18:08 UTC
[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux
On Wed, May 13, 2015 at 10:39 AM, Reid Kleckner <rnk at google.com> wrote:> It's a 20m timeout without output. > > If you back up to the build and look at the 'annotate' step output, > there's this text: > > http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/annotate/logs/stdio > > -- Testing: 258 tests, 16 threads -- > Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. > command timed out: 1200 seconds without output, attempting to kill > process killed by signal 9 > program finished with exit code -1 > elapsedTime=3507.624426 > > The annotator should probably include that timeout text in the failing > step, so that sounds like a bug. > > Another issue is that tsan times out sometimes. >Also - how often are the timeouts actually indicative of regressions. Perhaps we could flag them as "exceptional" results, shown in purple (& possibly not emailing anyone except the buildbot owner) - rather than red failures somehow.> Should we be sending tsan build failures to upstream developers? How often > do they break tsan? I suspect that when LLVM breaks tsan, it also breaks > ASan, which isn't as flaky. It might be better to mail the tsan failures to > Dmitry or someone and not upstream LLVM devs. > > On Wed, May 13, 2015 at 9:59 AM, Diego Novillo <dnovillo at google.com> > wrote: > >> Alexey, I got mail from one of the tsan buildbots, claiming a breakage >> in tsan tests. But I cannot see anything on the logs it has for the >> build. >> >> >> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/run%2064-bit%20tsan%20unit%20tests/logs/stdio >> >> Any ideas? Thanks. Diego. >> >> >> ---------- Forwarded message ---------- >> From: <llvm.buildmaster at lab.llvm.org> >> Date: Wed, May 13, 2015 at 12:53 PM >> Subject: buildbot failure in LLVM on sanitizer-x86_64-linux >> To: Brendon Cahoon <bcahoon at codeaurora.org>, Diego Novillo >> <dnovillo at google.com>, Teresa Johnson <tejohnson at google.com>, Yaron >> Keren <yaron.keren at gmail.com> >> Cc: gkistanova at gmail.com >> >> >> The Buildbot has detected a new failure on builder >> sanitizer-x86_64-linux while building llvm. >> Full details are available at: >> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916 >> >> Buildbot URL: http://lab.llvm.org:8011/ >> >> Buildslave for this Build: sanitizer-buildbot1 >> >> Build Reason: scheduler >> Build Source Stamp: [branch trunk] 237261 >> Blamelist: bcahoon,dnovillo,tejohnson,yrnkrn >> >> BUILD FAILED: failed annotate failed run 64-bit tsan unit tests >> >> sincerely, >> -The Buildbot >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150513/b9023ebc/attachment.html>
Kostya Serebryany
2015-May-14  20:08 UTC
[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux
+dvyukov On Wed, May 13, 2015 at 11:08 AM, David Blaikie <dblaikie at gmail.com> wrote:> > > On Wed, May 13, 2015 at 10:39 AM, Reid Kleckner <rnk at google.com> wrote: > >> It's a 20m timeout without output. >> >> If you back up to the build and look at the 'annotate' step output, >> there's this text: >> >> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/annotate/logs/stdio >> >> -- Testing: 258 tests, 16 threads -- >> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. >> command timed out: 1200 seconds without output, attempting to kill >> process killed by signal 9 >> program finished with exit code -1 >> elapsedTime=3507.624426 >> >> The annotator should probably include that timeout text in the failing >> step, so that sounds like a bug. >> >> Another issue is that tsan times out sometimes. >> > > Also - how often are the timeouts actually indicative of regressions. > Perhaps we could flag them as "exceptional" results, shown in purple (& > possibly not emailing anyone except the buildbot owner) - rather than red > failures somehow. > > >> Should we be sending tsan build failures to upstream developers? How >> often do they break tsan? I suspect that when LLVM breaks tsan, it also >> breaks ASan, which isn't as flaky. It might be better to mail the tsan >> failures to Dmitry or someone and not upstream LLVM devs. >> >> On Wed, May 13, 2015 at 9:59 AM, Diego Novillo <dnovillo at google.com> >> wrote: >> >>> Alexey, I got mail from one of the tsan buildbots, claiming a breakage >>> in tsan tests. But I cannot see anything on the logs it has for the >>> build. >>> >>> >>> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/run%2064-bit%20tsan%20unit%20tests/logs/stdio >>> >>> Any ideas? Thanks. Diego. >>> >>> >>> ---------- Forwarded message ---------- >>> From: <llvm.buildmaster at lab.llvm.org> >>> Date: Wed, May 13, 2015 at 12:53 PM >>> Subject: buildbot failure in LLVM on sanitizer-x86_64-linux >>> To: Brendon Cahoon <bcahoon at codeaurora.org>, Diego Novillo >>> <dnovillo at google.com>, Teresa Johnson <tejohnson at google.com>, Yaron >>> Keren <yaron.keren at gmail.com> >>> Cc: gkistanova at gmail.com >>> >>> >>> The Buildbot has detected a new failure on builder >>> sanitizer-x86_64-linux while building llvm. >>> Full details are available at: >>> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916 >>> >>> Buildbot URL: http://lab.llvm.org:8011/ >>> >>> Buildslave for this Build: sanitizer-buildbot1 >>> >>> Build Reason: scheduler >>> Build Source Stamp: [branch trunk] 237261 >>> Blamelist: bcahoon,dnovillo,tejohnson,yrnkrn >>> >>> BUILD FAILED: failed annotate failed run 64-bit tsan unit tests >>> >>> sincerely, >>> -The Buildbot >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150514/c295079d/attachment.html>
Reid Kleckner
2015-May-29  23:05 UTC
[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux
Happened to me again: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18273/steps/annotate/logs/stdio In fact, this whole bot has a 20% failure rate with the same failure mode, from looking at the history: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/?numbuilds=50 They all end with this: [100%] Running ThreadSanitizer tests -- Testing: 258 tests, 16 threads -- Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. command timed out: 1200 seconds without output, attempting to kill It seems like we'd get a lot more value from this bot if we just disabled the tsan tests, or at whichever tests have the highest deadlock risk. On Thu, May 14, 2015 at 1:08 PM, Kostya Serebryany <kcc at google.com> wrote:> +dvyukov > > On Wed, May 13, 2015 at 11:08 AM, David Blaikie <dblaikie at gmail.com> > wrote: > >> >> >> On Wed, May 13, 2015 at 10:39 AM, Reid Kleckner <rnk at google.com> wrote: >> >>> It's a 20m timeout without output. >>> >>> If you back up to the build and look at the 'annotate' step output, >>> there's this text: >>> >>> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/annotate/logs/stdio >>> >>> -- Testing: 258 tests, 16 threads -- >>> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. >>> command timed out: 1200 seconds without output, attempting to kill >>> process killed by signal 9 >>> program finished with exit code -1 >>> elapsedTime=3507.624426 >>> >>> The annotator should probably include that timeout text in the failing >>> step, so that sounds like a bug. >>> >>> Another issue is that tsan times out sometimes. >>> >> >> Also - how often are the timeouts actually indicative of regressions. >> Perhaps we could flag them as "exceptional" results, shown in purple (& >> possibly not emailing anyone except the buildbot owner) - rather than red >> failures somehow. >> >> >>> Should we be sending tsan build failures to upstream developers? How >>> often do they break tsan? I suspect that when LLVM breaks tsan, it also >>> breaks ASan, which isn't as flaky. It might be better to mail the tsan >>> failures to Dmitry or someone and not upstream LLVM devs. >>> >>> On Wed, May 13, 2015 at 9:59 AM, Diego Novillo <dnovillo at google.com> >>> wrote: >>> >>>> Alexey, I got mail from one of the tsan buildbots, claiming a breakage >>>> in tsan tests. But I cannot see anything on the logs it has for the >>>> build. >>>> >>>> >>>> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/run%2064-bit%20tsan%20unit%20tests/logs/stdio >>>> >>>> Any ideas? Thanks. Diego. >>>> >>>> >>>> ---------- Forwarded message ---------- >>>> From: <llvm.buildmaster at lab.llvm.org> >>>> Date: Wed, May 13, 2015 at 12:53 PM >>>> Subject: buildbot failure in LLVM on sanitizer-x86_64-linux >>>> To: Brendon Cahoon <bcahoon at codeaurora.org>, Diego Novillo >>>> <dnovillo at google.com>, Teresa Johnson <tejohnson at google.com>, Yaron >>>> Keren <yaron.keren at gmail.com> >>>> Cc: gkistanova at gmail.com >>>> >>>> >>>> The Buildbot has detected a new failure on builder >>>> sanitizer-x86_64-linux while building llvm. >>>> Full details are available at: >>>> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916 >>>> >>>> Buildbot URL: http://lab.llvm.org:8011/ >>>> >>>> Buildslave for this Build: sanitizer-buildbot1 >>>> >>>> Build Reason: scheduler >>>> Build Source Stamp: [branch trunk] 237261 >>>> Blamelist: bcahoon,dnovillo,tejohnson,yrnkrn >>>> >>>> BUILD FAILED: failed annotate failed run 64-bit tsan unit tests >>>> >>>> sincerely, >>>> -The Buildbot >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150529/50c96d30/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux
- [LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux
- [LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux
- [LLVMdev] buildbot failure in LLVM on sanitizer-x86_64-linux (-Wframe-larger-than)
- [LLVMdev] Need help reproducing a sanitizer buildbot failure