thr3ads.net - llvm dev - [LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86

If this information is useful, please help other people find it:
Share via:

Reid Kleckner

2015-May-29 23:05 UTC

[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

Happened to me again:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18273/steps/annotate/logs/stdio

In fact, this whole bot has a 20% failure rate with the same failure mode,
from looking at the history:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/?numbuilds=50

They all end with this:
[100%] Running ThreadSanitizer tests
-- Testing: 258 tests, 16 threads --
Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
command timed out: 1200 seconds without output, attempting to kill

It seems like we'd get a lot more value from this bot if we just disabled
the tsan tests, or at whichever tests have the highest deadlock risk.

On Thu, May 14, 2015 at 1:08 PM, Kostya Serebryany <kcc at google.com>
wrote:
> +dvyukov
>
> On Wed, May 13, 2015 at 11:08 AM, David Blaikie <dblaikie at
gmail.com>
> wrote:
>
>>
>>
>> On Wed, May 13, 2015 at 10:39 AM, Reid Kleckner <rnk at
google.com> wrote:
>>
>>> It's a 20m timeout without output.
>>>
>>> If you back up to the build and look at the 'annotate' step
output,
>>> there's this text:
>>>
>>>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/annotate/logs/stdio
>>>
>>> -- Testing: 258 tests, 16 threads --
>>> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
>>> command timed out: 1200 seconds without output, attempting to kill
>>> process killed by signal 9
>>> program finished with exit code -1
>>> elapsedTime=3507.624426
>>>
>>> The annotator should probably include that timeout text in the
failing
>>> step, so that sounds like a bug.
>>>
>>> Another issue is that tsan times out sometimes.
>>>
>>
>> Also - how often are the timeouts actually indicative of regressions.
>> Perhaps we could flag them as "exceptional" results, shown in
purple (&
>> possibly not emailing anyone except the buildbot owner) - rather than
red
>> failures somehow.
>>
>>
>>> Should we be sending tsan build failures to upstream developers?
How
>>> often do they break tsan? I suspect that when LLVM breaks tsan, it
also
>>> breaks ASan, which isn't as flaky. It might be better to mail
the tsan
>>> failures to Dmitry or someone and not upstream LLVM devs.
>>>
>>> On Wed, May 13, 2015 at 9:59 AM, Diego Novillo <dnovillo at
google.com>
>>> wrote:
>>>
>>>> Alexey, I got mail from one of the tsan buildbots, claiming a
breakage
>>>> in tsan tests. But I cannot see anything on the logs it has for
the
>>>> build.
>>>>
>>>>
>>>>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/run%2064-bit%20tsan%20unit%20tests/logs/stdio
>>>>
>>>> Any ideas?  Thanks. Diego.
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From:  <llvm.buildmaster at lab.llvm.org>
>>>> Date: Wed, May 13, 2015 at 12:53 PM
>>>> Subject: buildbot failure in LLVM on sanitizer-x86_64-linux
>>>> To: Brendon Cahoon <bcahoon at codeaurora.org>, Diego
Novillo
>>>> <dnovillo at google.com>, Teresa Johnson <tejohnson at
google.com>, Yaron
>>>> Keren <yaron.keren at gmail.com>
>>>> Cc: gkistanova at gmail.com
>>>>
>>>>
>>>> The Buildbot has detected a new failure on builder
>>>> sanitizer-x86_64-linux while building llvm.
>>>> Full details are available at:
>>>> 
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916
>>>>
>>>> Buildbot URL: http://lab.llvm.org:8011/
>>>>
>>>> Buildslave for this Build: sanitizer-buildbot1
>>>>
>>>> Build Reason: scheduler
>>>> Build Source Stamp: [branch trunk] 237261
>>>> Blamelist: bcahoon,dnovillo,tejohnson,yrnkrn
>>>>
>>>> BUILD FAILED: failed annotate failed run 64-bit tsan unit tests
>>>>
>>>> sincerely,
>>>>  -The Buildbot
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150529/50c96d30/attachment.html>

Dmitry Vyukov

2015-Jun-02 14:07 UTC

head link

[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

Do we know that 14.4 GB of RAM is enough to run tsan tests with
parallelism level 16? I would not be surprised if it is not. Don't yet
have a machine to test.
Alexey, reduce parallelism level for tsan tests to 4 on that bot and
let's see what happens.


On Fri, May 29, 2015 at 11:05 PM, Reid Kleckner <rnk at google.com>
wrote:> Happened to me again:
>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18273/steps/annotate/logs/stdio
>
> In fact, this whole bot has a 20% failure rate with the same failure mode,
> from looking at the history:
> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/?numbuilds=50
>
> They all end with this:
> [100%] Running ThreadSanitizer tests
> -- Testing: 258 tests, 16 threads --
> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
> command timed out: 1200 seconds without output, attempting to kill
>
> It seems like we'd get a lot more value from this bot if we just
disabled
> the tsan tests, or at whichever tests have the highest deadlock risk.
>
> On Thu, May 14, 2015 at 1:08 PM, Kostya Serebryany <kcc at
google.com> wrote:
>>
>> +dvyukov
>>
>> On Wed, May 13, 2015 at 11:08 AM, David Blaikie <dblaikie at
gmail.com>
>> wrote:
>>>
>>>
>>>
>>> On Wed, May 13, 2015 at 10:39 AM, Reid Kleckner <rnk at
google.com> wrote:
>>>>
>>>> It's a 20m timeout without output.
>>>>
>>>> If you back up to the build and look at the 'annotate'
step output,
>>>> there's this text:
>>>>
>>>>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/annotate/logs/stdio
>>>>
>>>> -- Testing: 258 tests, 16 threads --
>>>> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
>>>> command timed out: 1200 seconds without output, attempting to
kill
>>>> process killed by signal 9
>>>> program finished with exit code -1
>>>> elapsedTime=3507.624426
>>>>
>>>> The annotator should probably include that timeout text in the
failing
>>>> step, so that sounds like a bug.
>>>>
>>>> Another issue is that tsan times out sometimes.
>>>
>>>
>>> Also - how often are the timeouts actually indicative of
regressions.
>>> Perhaps we could flag them as "exceptional" results,
shown in purple (&
>>> possibly not emailing anyone except the buildbot owner) - rather
than red
>>> failures somehow.
>>>
>>>>
>>>> Should we be sending tsan build failures to upstream
developers? How
>>>> often do they break tsan? I suspect that when LLVM breaks tsan,
it also
>>>> breaks ASan, which isn't as flaky. It might be better to
mail the tsan
>>>> failures to Dmitry or someone and not upstream LLVM devs.
>>>>
>>>> On Wed, May 13, 2015 at 9:59 AM, Diego Novillo <dnovillo at
google.com>
>>>> wrote:
>>>>>
>>>>> Alexey, I got mail from one of the tsan buildbots, claiming
a breakage
>>>>> in tsan tests. But I cannot see anything on the logs it has
for the
>>>>> build.
>>>>>
>>>>>
>>>>>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/run%2064-bit%20tsan%20unit%20tests/logs/stdio
>>>>>
>>>>> Any ideas?  Thanks. Diego.
>>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From:  <llvm.buildmaster at lab.llvm.org>
>>>>> Date: Wed, May 13, 2015 at 12:53 PM
>>>>> Subject: buildbot failure in LLVM on sanitizer-x86_64-linux
>>>>> To: Brendon Cahoon <bcahoon at codeaurora.org>, Diego
Novillo
>>>>> <dnovillo at google.com>, Teresa Johnson
<tejohnson at google.com>, Yaron
>>>>> Keren <yaron.keren at gmail.com>
>>>>> Cc: gkistanova at gmail.com
>>>>>
>>>>>
>>>>> The Buildbot has detected a new failure on builder
>>>>> sanitizer-x86_64-linux while building llvm.
>>>>> Full details are available at:
>>>>> 
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916
>>>>>
>>>>> Buildbot URL: http://lab.llvm.org:8011/
>>>>>
>>>>> Buildslave for this Build: sanitizer-buildbot1
>>>>>
>>>>> Build Reason: scheduler
>>>>> Build Source Stamp: [branch trunk] 237261
>>>>> Blamelist: bcahoon,dnovillo,tejohnson,yrnkrn
>>>>>
>>>>> BUILD FAILED: failed annotate failed run 64-bit tsan unit
tests
>>>>>
>>>>> sincerely,
>>>>>  -The Buildbot
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>

Reid Kleckner

2015-Jun-09 02:50 UTC

head link

[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

So far as I can tell no one is root causing this, so in the meantime can we
disable check-tsan?

On Tue, Jun 2, 2015 at 7:07 AM, Dmitry Vyukov <dvyukov at google.com>
wrote:
> Do we know that 14.4 GB of RAM is enough to run tsan tests with
> parallelism level 16? I would not be surprised if it is not. Don't yet
> have a machine to test.
> Alexey, reduce parallelism level for tsan tests to 4 on that bot and
> let's see what happens.
>
>
> On Fri, May 29, 2015 at 11:05 PM, Reid Kleckner <rnk at google.com>
wrote:
> > Happened to me again:
> >
>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18273/steps/annotate/logs/stdio
> >
> > In fact, this whole bot has a 20% failure rate with the same failure
> mode,
> > from looking at the history:
> > http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/?numbuilds=50
> >
> > They all end with this:
> > [100%] Running ThreadSanitizer tests
> > -- Testing: 258 tests, 16 threads --
> > Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
> > command timed out: 1200 seconds without output, attempting to kill
> >
> > It seems like we'd get a lot more value from this bot if we just
disabled
> > the tsan tests, or at whichever tests have the highest deadlock risk.
> >
> > On Thu, May 14, 2015 at 1:08 PM, Kostya Serebryany <kcc at
google.com>
> wrote:
> >>
> >> +dvyukov
> >>
> >> On Wed, May 13, 2015 at 11:08 AM, David Blaikie <dblaikie at
gmail.com>
> >> wrote:
> >>>
> >>>
> >>>
> >>> On Wed, May 13, 2015 at 10:39 AM, Reid Kleckner <rnk at
google.com>
> wrote:
> >>>>
> >>>> It's a 20m timeout without output.
> >>>>
> >>>> If you back up to the build and look at the
'annotate' step output,
> >>>> there's this text:
> >>>>
> >>>>
>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/annotate/logs/stdio
> >>>>
> >>>> -- Testing: 258 tests, 16 threads --
> >>>> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
> >>>> command timed out: 1200 seconds without output, attempting
to kill
> >>>> process killed by signal 9
> >>>> program finished with exit code -1
> >>>> elapsedTime=3507.624426
> >>>>
> >>>> The annotator should probably include that timeout text in
the failing
> >>>> step, so that sounds like a bug.
> >>>>
> >>>> Another issue is that tsan times out sometimes.
> >>>
> >>>
> >>> Also - how often are the timeouts actually indicative of
regressions.
> >>> Perhaps we could flag them as "exceptional" results,
shown in purple (&
> >>> possibly not emailing anyone except the buildbot owner) -
rather than
> red
> >>> failures somehow.
> >>>
> >>>>
> >>>> Should we be sending tsan build failures to upstream
developers? How
> >>>> often do they break tsan? I suspect that when LLVM breaks
tsan, it
> also
> >>>> breaks ASan, which isn't as flaky. It might be better
to mail the tsan
> >>>> failures to Dmitry or someone and not upstream LLVM devs.
> >>>>
> >>>> On Wed, May 13, 2015 at 9:59 AM, Diego Novillo
<dnovillo at google.com>
> >>>> wrote:
> >>>>>
> >>>>> Alexey, I got mail from one of the tsan buildbots,
claiming a
> breakage
> >>>>> in tsan tests. But I cannot see anything on the logs
it has for the
> >>>>> build.
> >>>>>
> >>>>>
> >>>>>
>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/run%2064-bit%20tsan%20unit%20tests/logs/stdio
> >>>>>
> >>>>> Any ideas?  Thanks. Diego.
> >>>>>
> >>>>>
> >>>>> ---------- Forwarded message ----------
> >>>>> From:  <llvm.buildmaster at lab.llvm.org>
> >>>>> Date: Wed, May 13, 2015 at 12:53 PM
> >>>>> Subject: buildbot failure in LLVM on
sanitizer-x86_64-linux
> >>>>> To: Brendon Cahoon <bcahoon at codeaurora.org>,
Diego Novillo
> >>>>> <dnovillo at google.com>, Teresa Johnson
<tejohnson at google.com>, Yaron
> >>>>> Keren <yaron.keren at gmail.com>
> >>>>> Cc: gkistanova at gmail.com
> >>>>>
> >>>>>
> >>>>> The Buildbot has detected a new failure on builder
> >>>>> sanitizer-x86_64-linux while building llvm.
> >>>>> Full details are available at:
> >>>>>
> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916
> >>>>>
> >>>>> Buildbot URL: http://lab.llvm.org:8011/
> >>>>>
> >>>>> Buildslave for this Build: sanitizer-buildbot1
> >>>>>
> >>>>> Build Reason: scheduler
> >>>>> Build Source Stamp: [branch trunk] 237261
> >>>>> Blamelist: bcahoon,dnovillo,tejohnson,yrnkrn
> >>>>>
> >>>>> BUILD FAILED: failed annotate failed run 64-bit tsan
unit tests
> >>>>>
> >>>>> sincerely,
> >>>>>  -The Buildbot
> >>>>> _______________________________________________
> >>>>> LLVM Developers mailing list
> >>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> LLVM Developers mailing list
> >>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>
> >>
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150608/178f68b5/attachment.html>

llvm dev - Jun 2015 - [LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux