thr3ads.net - llvm dev - [llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9 [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Philip Reames via llvm-dev

2015-Aug-26 16:30 UTC

[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9

On 08/26/2015 08:21 AM, Renato Golin via llvm-dev wrote:> On 26 August 2015 at 15:44, Tobias Grosser <tobias at grosser.es>
wrote:
>> What time-line do you have in mind for this fix? If you are in charge
>> and can make this happen within a day, giving cmake + ninja a chance
seems
>> OK.
> It's not my bot. All my bots are CMake+Ninja based and are stable
enough.
>
>
>> However, if the owner of the buildbot is not known or the fix can not
come
>> soon, I am in favor of disabling the noise and (re)enabling it when
someone
>> found time to address the problem and verify the solution.
> That's up to Galina. We haven't had any action against unstable
bots
> so far, and this is not the only one. There are lots of Windows and
> sanitizer bots that break randomly and provide little information, are
> we going to disable them all? How about the perf bots that still fail
> occasionally and we haven't managed to fix the root cause, are we
> going to disable then, too?If the bot fails regularly (say false positive rate 1 in 10 runs), then 
yes, it should be disabled until the owner fixes it.  It's perfectly 
okay for it to be put into a "known unstable" list and for the bot
owner
to report failures after they've been confirmed.

To say this differently, we will revert a *change* which is 
problematic.  Why shouldn't we "revert" a
bot?>
> You're asking to reduce considerably the quality of testing on some
> areas so that you can reduce the time spent looking at spurious
> failures. I don't agree with that in principle. There were other
> threads focusing on how to make them less spurious, more stable, less
> noisy, and some work is being done on the GreenDragon bot structure.
> But killing everything that looks suspicious now will reduce our
> ability to validate LLVM on the range of configurations that we do
> today, and that, for me, is a lot worse than a few minutes' worth of
> some engineers.
>
>
>> The cost of
>> buildbot noise is very high, both in terms of developer time spent, but
>> more importantly due to people starting to ignore them when monitoring
them
>> becomes costly.
> I think you're overestimating the cost.
>
> When I get bot emails, I click on the link and if it was timeout, I
> always ignore it. If I can't make heads or tails (like the sanitizer
> ones), I ignore it temporarily, then look again next day.I disagree strongly here.  The cost of having flaky bots is quite high.  
When I make a commit, I'm committing to be responsive to problems it 
introduces over the next few hours.  Every one of those false positives 
is a 5-10 minute high priority interruption to what I'm actually working 
on.  In practice, that greatly diminishes my effectiveness.

As an illustrative example, I submitted some documentation changes 
earlier this week and got 5 unique build failure notices.  In this case, 
I ignored them, but if that had been a small code change, that would 
have cost me at least an hour of productivity.>
> My assumption is that the bot owner will make me aware if the reason
> is not obvious, as I do with my bots. I always wait for people to
> realise, and fix. But if they can't, either because the bot was
> already broken, or because the breakage isn't clear, I let people know
> where to search for the information in the bot itself. This is my
> responsibility as a bot owner.First, thanks for being a responsible bot owner.  :)

If all bot owners were doing this, having a unstable list which doesn't 
actively notify would be completely workable.  If not all bot owners are 
doing this, I can't say I really care about the status of those
bots.>
> I appreciate the benefit of having green / red bots, but you also have
> to appreciate that hardware is not perfect, and they will invariably
> fail once in a while. I had some Polly bots failing randomly and it
> took me only a couple of seconds to infer so. I'm not asking to remove
> them, even those that fail more than pass throughout the year. I
> assume that, if they're still there, it provides *some* value to
> someone.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Renato Golin via llvm-dev

2015-Aug-26 16:38 UTC

head link

[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9

On 26 August 2015 at 17:30, Philip Reames <listmail at philipreames.com>
wrote:> To say this differently, we will revert a *change* which is problematic.
> Why shouldn't we "revert" a bot?
I don't disagree, just don't want to do that lightly. Most certainly
not before we have comments from the bot owner.

> As an illustrative example, I submitted some documentation changes earlier
> this week and got 5 unique build failure notices.  In this case, I ignored
> them, but if that had been a small code change, that would have cost me at
> least an hour of productivity.
I have to say, I never spent more than a few minutes looking up
failing bots. If there's nothing that I can find in 30 seconds of
looking at the bot screen, I rely on the bot owners to ping me, revert
my patches, let me know what's wrong.

I'll make your words, mine:
> If all bot owners were doing this, having a unstable list which doesn't
> actively notify would be completely workable.  If not all bot owners are
> doing this, I can't say I really care about the status of those bots.
:D

cheers,
--renato

Philip Reames via llvm-dev

2015-Aug-26 16:43 UTC

head link

[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9

On 08/26/2015 09:38 AM, Renato Golin wrote:> On 26 August 2015 at 17:30, Philip Reames <listmail at
philipreames.com> wrote:
>> To say this differently, we will revert a *change* which is
problematic.
>> Why shouldn't we "revert" a bot?
> I don't disagree, just don't want to do that lightly. Most
certainly
> not before we have comments from the bot owner.Why?  This is not our policy for commits; why should it be different for 
bots?  Comments within a reasonable time window (2 hours?) sure, but an 
unresponsive owner can simply re-enable when they get around to it.  
Just like the commit author can re-apply at a later
time.>
>
>> As an illustrative example, I submitted some documentation changes
earlier
>> this week and got 5 unique build failure notices.  In this case, I
ignored
>> them, but if that had been a small code change, that would have cost me
at
>> least an hour of productivity.
> I have to say, I never spent more than a few minutes looking up
> failing bots. If there's nothing that I can find in 30 seconds of
> looking at the bot screen, I rely on the bot owners to ping me, revert
> my patches, let me know what's wrong.The key point I was trying to make was the interruption factor.  Not all 
of the notices come it at once.  If I could batch process, it would take 
a lot less time.

Also, there are simply some usability issues; finding the actual build 
error on a small screen - from a phone while in a meeting say - is 
rather challenging.  I usually end up having to move to a laptop before 
being able to identify something as a false positive.>
> I'll make your words, mine:
>
>> If all bot owners were doing this, having a unstable list which
doesn't
>> actively notify would be completely workable.  If not all bot owners
are
>> doing this, I can't say I really care about the status of those
bots.
> :D
>
> cheers,
> --renato

David Blaikie via llvm-dev

2015-Aug-26 16:44 UTC

head link

[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9

On Wed, Aug 26, 2015 at 9:38 AM, Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 26 August 2015 at 17:30, Philip Reames <listmail at
philipreames.com>
> wrote:
> > To say this differently, we will revert a *change* which is
problematic.
> > Why shouldn't we "revert" a bot?
>
> I don't disagree, just don't want to do that lightly. Most
certainly
> not before we have comments from the bot owner.
>
>
> > As an illustrative example, I submitted some documentation changes
> earlier
> > this week and got 5 unique build failure notices.  In this case, I
> ignored
> > them, but if that had been a small code change, that would have cost
me
> at
> > least an hour of productivity.
>
> I have to say, I never spent more than a few minutes looking up
> failing bots. If there's nothing that I can find in 30 seconds of
> looking at the bot screen, I rely on the bot owners to ping me, revert
> my patches, let me know what's wrong.
>
Which is why this is a hard thing to fix (each individual instance doesn't
cost much) and why it's important: The pain is distributed temporally and
geographically: each of us incurs a small amount of pain regularly, but in
aggregate that's a substantial drag on the project.

- David

>
> I'll make your words, mine:
>
> > If all bot owners were doing this, having a unstable list which
doesn't
> > actively notify would be completely workable.  If not all bot owners
are
> > doing this, I can't say I really care about the status of those
bots.
>
> :D
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150826/974742fd/attachment.html>

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Aug 2015 - buildbot failure in LLVM on clang-native-arm-cortex-a9

[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9

[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9

[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9

[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9

Apparently Analagous Threads