Reid Kleckner
2015-May-29 23:05 UTC
[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux
Happened to me again: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18273/steps/annotate/logs/stdio In fact, this whole bot has a 20% failure rate with the same failure mode, from looking at the history: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/?numbuilds=50 They all end with this: [100%] Running ThreadSanitizer tests -- Testing: 258 tests, 16 threads -- Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. command timed out: 1200 seconds without output, attempting to kill It seems like we'd get a lot more value from this bot if we just disabled the tsan tests, or at whichever tests have the highest deadlock risk. On Thu, May 14, 2015 at 1:08 PM, Kostya Serebryany <kcc at google.com> wrote:> +dvyukov > > On Wed, May 13, 2015 at 11:08 AM, David Blaikie <dblaikie at gmail.com> > wrote: > >> >> >> On Wed, May 13, 2015 at 10:39 AM, Reid Kleckner <rnk at google.com> wrote: >> >>> It's a 20m timeout without output. >>> >>> If you back up to the build and look at the 'annotate' step output, >>> there's this text: >>> >>> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/annotate/logs/stdio >>> >>> -- Testing: 258 tests, 16 threads -- >>> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. >>> command timed out: 1200 seconds without output, attempting to kill >>> process killed by signal 9 >>> program finished with exit code -1 >>> elapsedTime=3507.624426 >>> >>> The annotator should probably include that timeout text in the failing >>> step, so that sounds like a bug. >>> >>> Another issue is that tsan times out sometimes. >>> >> >> Also - how often are the timeouts actually indicative of regressions. >> Perhaps we could flag them as "exceptional" results, shown in purple (& >> possibly not emailing anyone except the buildbot owner) - rather than red >> failures somehow. >> >> >>> Should we be sending tsan build failures to upstream developers? How >>> often do they break tsan? I suspect that when LLVM breaks tsan, it also >>> breaks ASan, which isn't as flaky. It might be better to mail the tsan >>> failures to Dmitry or someone and not upstream LLVM devs. >>> >>> On Wed, May 13, 2015 at 9:59 AM, Diego Novillo <dnovillo at google.com> >>> wrote: >>> >>>> Alexey, I got mail from one of the tsan buildbots, claiming a breakage >>>> in tsan tests. But I cannot see anything on the logs it has for the >>>> build. >>>> >>>> >>>> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/run%2064-bit%20tsan%20unit%20tests/logs/stdio >>>> >>>> Any ideas? Thanks. Diego. >>>> >>>> >>>> ---------- Forwarded message ---------- >>>> From: <llvm.buildmaster at lab.llvm.org> >>>> Date: Wed, May 13, 2015 at 12:53 PM >>>> Subject: buildbot failure in LLVM on sanitizer-x86_64-linux >>>> To: Brendon Cahoon <bcahoon at codeaurora.org>, Diego Novillo >>>> <dnovillo at google.com>, Teresa Johnson <tejohnson at google.com>, Yaron >>>> Keren <yaron.keren at gmail.com> >>>> Cc: gkistanova at gmail.com >>>> >>>> >>>> The Buildbot has detected a new failure on builder >>>> sanitizer-x86_64-linux while building llvm. >>>> Full details are available at: >>>> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916 >>>> >>>> Buildbot URL: http://lab.llvm.org:8011/ >>>> >>>> Buildslave for this Build: sanitizer-buildbot1 >>>> >>>> Build Reason: scheduler >>>> Build Source Stamp: [branch trunk] 237261 >>>> Blamelist: bcahoon,dnovillo,tejohnson,yrnkrn >>>> >>>> BUILD FAILED: failed annotate failed run 64-bit tsan unit tests >>>> >>>> sincerely, >>>> -The Buildbot >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150529/50c96d30/attachment.html>
Dmitry Vyukov
2015-Jun-02 14:07 UTC
[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux
Do we know that 14.4 GB of RAM is enough to run tsan tests with parallelism level 16? I would not be surprised if it is not. Don't yet have a machine to test. Alexey, reduce parallelism level for tsan tests to 4 on that bot and let's see what happens. On Fri, May 29, 2015 at 11:05 PM, Reid Kleckner <rnk at google.com> wrote:> Happened to me again: > http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18273/steps/annotate/logs/stdio > > In fact, this whole bot has a 20% failure rate with the same failure mode, > from looking at the history: > http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/?numbuilds=50 > > They all end with this: > [100%] Running ThreadSanitizer tests > -- Testing: 258 tests, 16 threads -- > Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. > command timed out: 1200 seconds without output, attempting to kill > > It seems like we'd get a lot more value from this bot if we just disabled > the tsan tests, or at whichever tests have the highest deadlock risk. > > On Thu, May 14, 2015 at 1:08 PM, Kostya Serebryany <kcc at google.com> wrote: >> >> +dvyukov >> >> On Wed, May 13, 2015 at 11:08 AM, David Blaikie <dblaikie at gmail.com> >> wrote: >>> >>> >>> >>> On Wed, May 13, 2015 at 10:39 AM, Reid Kleckner <rnk at google.com> wrote: >>>> >>>> It's a 20m timeout without output. >>>> >>>> If you back up to the build and look at the 'annotate' step output, >>>> there's this text: >>>> >>>> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/annotate/logs/stdio >>>> >>>> -- Testing: 258 tests, 16 threads -- >>>> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. >>>> command timed out: 1200 seconds without output, attempting to kill >>>> process killed by signal 9 >>>> program finished with exit code -1 >>>> elapsedTime=3507.624426 >>>> >>>> The annotator should probably include that timeout text in the failing >>>> step, so that sounds like a bug. >>>> >>>> Another issue is that tsan times out sometimes. >>> >>> >>> Also - how often are the timeouts actually indicative of regressions. >>> Perhaps we could flag them as "exceptional" results, shown in purple (& >>> possibly not emailing anyone except the buildbot owner) - rather than red >>> failures somehow. >>> >>>> >>>> Should we be sending tsan build failures to upstream developers? How >>>> often do they break tsan? I suspect that when LLVM breaks tsan, it also >>>> breaks ASan, which isn't as flaky. It might be better to mail the tsan >>>> failures to Dmitry or someone and not upstream LLVM devs. >>>> >>>> On Wed, May 13, 2015 at 9:59 AM, Diego Novillo <dnovillo at google.com> >>>> wrote: >>>>> >>>>> Alexey, I got mail from one of the tsan buildbots, claiming a breakage >>>>> in tsan tests. But I cannot see anything on the logs it has for the >>>>> build. >>>>> >>>>> >>>>> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/run%2064-bit%20tsan%20unit%20tests/logs/stdio >>>>> >>>>> Any ideas? Thanks. Diego. >>>>> >>>>> >>>>> ---------- Forwarded message ---------- >>>>> From: <llvm.buildmaster at lab.llvm.org> >>>>> Date: Wed, May 13, 2015 at 12:53 PM >>>>> Subject: buildbot failure in LLVM on sanitizer-x86_64-linux >>>>> To: Brendon Cahoon <bcahoon at codeaurora.org>, Diego Novillo >>>>> <dnovillo at google.com>, Teresa Johnson <tejohnson at google.com>, Yaron >>>>> Keren <yaron.keren at gmail.com> >>>>> Cc: gkistanova at gmail.com >>>>> >>>>> >>>>> The Buildbot has detected a new failure on builder >>>>> sanitizer-x86_64-linux while building llvm. >>>>> Full details are available at: >>>>> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916 >>>>> >>>>> Buildbot URL: http://lab.llvm.org:8011/ >>>>> >>>>> Buildslave for this Build: sanitizer-buildbot1 >>>>> >>>>> Build Reason: scheduler >>>>> Build Source Stamp: [branch trunk] 237261 >>>>> Blamelist: bcahoon,dnovillo,tejohnson,yrnkrn >>>>> >>>>> BUILD FAILED: failed annotate failed run 64-bit tsan unit tests >>>>> >>>>> sincerely, >>>>> -The Buildbot >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >
Reid Kleckner
2015-Jun-09 02:50 UTC
[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux
So far as I can tell no one is root causing this, so in the meantime can we disable check-tsan? On Tue, Jun 2, 2015 at 7:07 AM, Dmitry Vyukov <dvyukov at google.com> wrote:> Do we know that 14.4 GB of RAM is enough to run tsan tests with > parallelism level 16? I would not be surprised if it is not. Don't yet > have a machine to test. > Alexey, reduce parallelism level for tsan tests to 4 on that bot and > let's see what happens. > > > On Fri, May 29, 2015 at 11:05 PM, Reid Kleckner <rnk at google.com> wrote: > > Happened to me again: > > > http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18273/steps/annotate/logs/stdio > > > > In fact, this whole bot has a 20% failure rate with the same failure > mode, > > from looking at the history: > > http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/?numbuilds=50 > > > > They all end with this: > > [100%] Running ThreadSanitizer tests > > -- Testing: 258 tests, 16 threads -- > > Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. > > command timed out: 1200 seconds without output, attempting to kill > > > > It seems like we'd get a lot more value from this bot if we just disabled > > the tsan tests, or at whichever tests have the highest deadlock risk. > > > > On Thu, May 14, 2015 at 1:08 PM, Kostya Serebryany <kcc at google.com> > wrote: > >> > >> +dvyukov > >> > >> On Wed, May 13, 2015 at 11:08 AM, David Blaikie <dblaikie at gmail.com> > >> wrote: > >>> > >>> > >>> > >>> On Wed, May 13, 2015 at 10:39 AM, Reid Kleckner <rnk at google.com> > wrote: > >>>> > >>>> It's a 20m timeout without output. > >>>> > >>>> If you back up to the build and look at the 'annotate' step output, > >>>> there's this text: > >>>> > >>>> > http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/annotate/logs/stdio > >>>> > >>>> -- Testing: 258 tests, 16 threads -- > >>>> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. > >>>> command timed out: 1200 seconds without output, attempting to kill > >>>> process killed by signal 9 > >>>> program finished with exit code -1 > >>>> elapsedTime=3507.624426 > >>>> > >>>> The annotator should probably include that timeout text in the failing > >>>> step, so that sounds like a bug. > >>>> > >>>> Another issue is that tsan times out sometimes. > >>> > >>> > >>> Also - how often are the timeouts actually indicative of regressions. > >>> Perhaps we could flag them as "exceptional" results, shown in purple (& > >>> possibly not emailing anyone except the buildbot owner) - rather than > red > >>> failures somehow. > >>> > >>>> > >>>> Should we be sending tsan build failures to upstream developers? How > >>>> often do they break tsan? I suspect that when LLVM breaks tsan, it > also > >>>> breaks ASan, which isn't as flaky. It might be better to mail the tsan > >>>> failures to Dmitry or someone and not upstream LLVM devs. > >>>> > >>>> On Wed, May 13, 2015 at 9:59 AM, Diego Novillo <dnovillo at google.com> > >>>> wrote: > >>>>> > >>>>> Alexey, I got mail from one of the tsan buildbots, claiming a > breakage > >>>>> in tsan tests. But I cannot see anything on the logs it has for the > >>>>> build. > >>>>> > >>>>> > >>>>> > http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/run%2064-bit%20tsan%20unit%20tests/logs/stdio > >>>>> > >>>>> Any ideas? Thanks. Diego. > >>>>> > >>>>> > >>>>> ---------- Forwarded message ---------- > >>>>> From: <llvm.buildmaster at lab.llvm.org> > >>>>> Date: Wed, May 13, 2015 at 12:53 PM > >>>>> Subject: buildbot failure in LLVM on sanitizer-x86_64-linux > >>>>> To: Brendon Cahoon <bcahoon at codeaurora.org>, Diego Novillo > >>>>> <dnovillo at google.com>, Teresa Johnson <tejohnson at google.com>, Yaron > >>>>> Keren <yaron.keren at gmail.com> > >>>>> Cc: gkistanova at gmail.com > >>>>> > >>>>> > >>>>> The Buildbot has detected a new failure on builder > >>>>> sanitizer-x86_64-linux while building llvm. > >>>>> Full details are available at: > >>>>> > http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916 > >>>>> > >>>>> Buildbot URL: http://lab.llvm.org:8011/ > >>>>> > >>>>> Buildslave for this Build: sanitizer-buildbot1 > >>>>> > >>>>> Build Reason: scheduler > >>>>> Build Source Stamp: [branch trunk] 237261 > >>>>> Blamelist: bcahoon,dnovillo,tejohnson,yrnkrn > >>>>> > >>>>> BUILD FAILED: failed annotate failed run 64-bit tsan unit tests > >>>>> > >>>>> sincerely, > >>>>> -The Buildbot > >>>>> _______________________________________________ > >>>>> LLVM Developers mailing list > >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>>> > >>> > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>> > >> > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150608/178f68b5/attachment.html>