thr3ads.net - llvm dev - [LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86

If this information is useful, please help other people find it:
Share via:

David Blaikie

2015-May-13 18:08 UTC

[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

On Wed, May 13, 2015 at 10:39 AM, Reid Kleckner <rnk at google.com> wrote:
> It's a 20m timeout without output.
>
> If you back up to the build and look at the 'annotate' step output,
> there's this text:
>
>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/annotate/logs/stdio
>
> -- Testing: 258 tests, 16 threads --
> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
> command timed out: 1200 seconds without output, attempting to kill
> process killed by signal 9
> program finished with exit code -1
> elapsedTime=3507.624426
>
> The annotator should probably include that timeout text in the failing
> step, so that sounds like a bug.
>
> Another issue is that tsan times out sometimes.
>
Also - how often are the timeouts actually indicative of regressions.
Perhaps we could flag them as "exceptional" results, shown in purple
(&
possibly not emailing anyone except the buildbot owner) - rather than red
failures somehow.

> Should we be sending tsan build failures to upstream developers? How often
> do they break tsan? I suspect that when LLVM breaks tsan, it also breaks
> ASan, which isn't as flaky. It might be better to mail the tsan
failures to
> Dmitry or someone and not upstream LLVM devs.
>
> On Wed, May 13, 2015 at 9:59 AM, Diego Novillo <dnovillo at
google.com>
> wrote:
>
>> Alexey, I got mail from one of the tsan buildbots, claiming a breakage
>> in tsan tests. But I cannot see anything on the logs it has for the
>> build.
>>
>>
>>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/run%2064-bit%20tsan%20unit%20tests/logs/stdio
>>
>> Any ideas?  Thanks. Diego.
>>
>>
>> ---------- Forwarded message ----------
>> From:  <llvm.buildmaster at lab.llvm.org>
>> Date: Wed, May 13, 2015 at 12:53 PM
>> Subject: buildbot failure in LLVM on sanitizer-x86_64-linux
>> To: Brendon Cahoon <bcahoon at codeaurora.org>, Diego Novillo
>> <dnovillo at google.com>, Teresa Johnson <tejohnson at
google.com>, Yaron
>> Keren <yaron.keren at gmail.com>
>> Cc: gkistanova at gmail.com
>>
>>
>> The Buildbot has detected a new failure on builder
>> sanitizer-x86_64-linux while building llvm.
>> Full details are available at:
>>  http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916
>>
>> Buildbot URL: http://lab.llvm.org:8011/
>>
>> Buildslave for this Build: sanitizer-buildbot1
>>
>> Build Reason: scheduler
>> Build Source Stamp: [branch trunk] 237261
>> Blamelist: bcahoon,dnovillo,tejohnson,yrnkrn
>>
>> BUILD FAILED: failed annotate failed run 64-bit tsan unit tests
>>
>> sincerely,
>>  -The Buildbot
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150513/b9023ebc/attachment.html>

Kostya Serebryany

2015-May-14 20:08 UTC

head link

[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

+dvyukov

On Wed, May 13, 2015 at 11:08 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
> On Wed, May 13, 2015 at 10:39 AM, Reid Kleckner <rnk at google.com>
wrote:
>
>> It's a 20m timeout without output.
>>
>> If you back up to the build and look at the 'annotate' step
output,
>> there's this text:
>>
>>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/annotate/logs/stdio
>>
>> -- Testing: 258 tests, 16 threads --
>> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
>> command timed out: 1200 seconds without output, attempting to kill
>> process killed by signal 9
>> program finished with exit code -1
>> elapsedTime=3507.624426
>>
>> The annotator should probably include that timeout text in the failing
>> step, so that sounds like a bug.
>>
>> Another issue is that tsan times out sometimes.
>>
>
> Also - how often are the timeouts actually indicative of regressions.
> Perhaps we could flag them as "exceptional" results, shown in
purple (&
> possibly not emailing anyone except the buildbot owner) - rather than red
> failures somehow.
>
>
>> Should we be sending tsan build failures to upstream developers? How
>> often do they break tsan? I suspect that when LLVM breaks tsan, it also
>> breaks ASan, which isn't as flaky. It might be better to mail the
tsan
>> failures to Dmitry or someone and not upstream LLVM devs.
>>
>> On Wed, May 13, 2015 at 9:59 AM, Diego Novillo <dnovillo at
google.com>
>> wrote:
>>
>>> Alexey, I got mail from one of the tsan buildbots, claiming a
breakage
>>> in tsan tests. But I cannot see anything on the logs it has for the
>>> build.
>>>
>>>
>>>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/run%2064-bit%20tsan%20unit%20tests/logs/stdio
>>>
>>> Any ideas?  Thanks. Diego.
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From:  <llvm.buildmaster at lab.llvm.org>
>>> Date: Wed, May 13, 2015 at 12:53 PM
>>> Subject: buildbot failure in LLVM on sanitizer-x86_64-linux
>>> To: Brendon Cahoon <bcahoon at codeaurora.org>, Diego Novillo
>>> <dnovillo at google.com>, Teresa Johnson <tejohnson at
google.com>, Yaron
>>> Keren <yaron.keren at gmail.com>
>>> Cc: gkistanova at gmail.com
>>>
>>>
>>> The Buildbot has detected a new failure on builder
>>> sanitizer-x86_64-linux while building llvm.
>>> Full details are available at:
>>> 
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916
>>>
>>> Buildbot URL: http://lab.llvm.org:8011/
>>>
>>> Buildslave for this Build: sanitizer-buildbot1
>>>
>>> Build Reason: scheduler
>>> Build Source Stamp: [branch trunk] 237261
>>> Blamelist: bcahoon,dnovillo,tejohnson,yrnkrn
>>>
>>> BUILD FAILED: failed annotate failed run 64-bit tsan unit tests
>>>
>>> sincerely,
>>>  -The Buildbot
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150514/c295079d/attachment.html>

Reid Kleckner

2015-May-29 23:05 UTC

head link

[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

Happened to me again:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18273/steps/annotate/logs/stdio

In fact, this whole bot has a 20% failure rate with the same failure mode,
from looking at the history:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/?numbuilds=50

They all end with this:
[100%] Running ThreadSanitizer tests
-- Testing: 258 tests, 16 threads --
Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
command timed out: 1200 seconds without output, attempting to kill

It seems like we'd get a lot more value from this bot if we just disabled
the tsan tests, or at whichever tests have the highest deadlock risk.

On Thu, May 14, 2015 at 1:08 PM, Kostya Serebryany <kcc at google.com>
wrote:
> +dvyukov
>
> On Wed, May 13, 2015 at 11:08 AM, David Blaikie <dblaikie at
gmail.com>
> wrote:
>
>>
>>
>> On Wed, May 13, 2015 at 10:39 AM, Reid Kleckner <rnk at
google.com> wrote:
>>
>>> It's a 20m timeout without output.
>>>
>>> If you back up to the build and look at the 'annotate' step
output,
>>> there's this text:
>>>
>>>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/annotate/logs/stdio
>>>
>>> -- Testing: 258 tests, 16 threads --
>>> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
>>> command timed out: 1200 seconds without output, attempting to kill
>>> process killed by signal 9
>>> program finished with exit code -1
>>> elapsedTime=3507.624426
>>>
>>> The annotator should probably include that timeout text in the
failing
>>> step, so that sounds like a bug.
>>>
>>> Another issue is that tsan times out sometimes.
>>>
>>
>> Also - how often are the timeouts actually indicative of regressions.
>> Perhaps we could flag them as "exceptional" results, shown in
purple (&
>> possibly not emailing anyone except the buildbot owner) - rather than
red
>> failures somehow.
>>
>>
>>> Should we be sending tsan build failures to upstream developers?
How
>>> often do they break tsan? I suspect that when LLVM breaks tsan, it
also
>>> breaks ASan, which isn't as flaky. It might be better to mail
the tsan
>>> failures to Dmitry or someone and not upstream LLVM devs.
>>>
>>> On Wed, May 13, 2015 at 9:59 AM, Diego Novillo <dnovillo at
google.com>
>>> wrote:
>>>
>>>> Alexey, I got mail from one of the tsan buildbots, claiming a
breakage
>>>> in tsan tests. But I cannot see anything on the logs it has for
the
>>>> build.
>>>>
>>>>
>>>>
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916/steps/run%2064-bit%20tsan%20unit%20tests/logs/stdio
>>>>
>>>> Any ideas?  Thanks. Diego.
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From:  <llvm.buildmaster at lab.llvm.org>
>>>> Date: Wed, May 13, 2015 at 12:53 PM
>>>> Subject: buildbot failure in LLVM on sanitizer-x86_64-linux
>>>> To: Brendon Cahoon <bcahoon at codeaurora.org>, Diego
Novillo
>>>> <dnovillo at google.com>, Teresa Johnson <tejohnson at
google.com>, Yaron
>>>> Keren <yaron.keren at gmail.com>
>>>> Cc: gkistanova at gmail.com
>>>>
>>>>
>>>> The Buildbot has detected a new failure on builder
>>>> sanitizer-x86_64-linux while building llvm.
>>>> Full details are available at:
>>>> 
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/17916
>>>>
>>>> Buildbot URL: http://lab.llvm.org:8011/
>>>>
>>>> Buildslave for this Build: sanitizer-buildbot1
>>>>
>>>> Build Reason: scheduler
>>>> Build Source Stamp: [branch trunk] 237261
>>>> Blamelist: bcahoon,dnovillo,tejohnson,yrnkrn
>>>>
>>>> BUILD FAILED: failed annotate failed run 64-bit tsan unit tests
>>>>
>>>> sincerely,
>>>>  -The Buildbot
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150529/50c96d30/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - May 2015 - [LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

[LLVMdev] Confusing buildbot failure in LLVM on sanitizer-x86_64-linux

Reasonably Related Threads