Philip Reames via llvm-dev
2015-Aug-26 16:30 UTC
[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9
On 08/26/2015 08:21 AM, Renato Golin via llvm-dev wrote:> On 26 August 2015 at 15:44, Tobias Grosser <tobias at grosser.es> wrote: >> What time-line do you have in mind for this fix? If you are in charge >> and can make this happen within a day, giving cmake + ninja a chance seems >> OK. > It's not my bot. All my bots are CMake+Ninja based and are stable enough. > > >> However, if the owner of the buildbot is not known or the fix can not come >> soon, I am in favor of disabling the noise and (re)enabling it when someone >> found time to address the problem and verify the solution. > That's up to Galina. We haven't had any action against unstable bots > so far, and this is not the only one. There are lots of Windows and > sanitizer bots that break randomly and provide little information, are > we going to disable them all? How about the perf bots that still fail > occasionally and we haven't managed to fix the root cause, are we > going to disable then, too?If the bot fails regularly (say false positive rate 1 in 10 runs), then yes, it should be disabled until the owner fixes it. It's perfectly okay for it to be put into a "known unstable" list and for the bot owner to report failures after they've been confirmed. To say this differently, we will revert a *change* which is problematic. Why shouldn't we "revert" a bot?> > You're asking to reduce considerably the quality of testing on some > areas so that you can reduce the time spent looking at spurious > failures. I don't agree with that in principle. There were other > threads focusing on how to make them less spurious, more stable, less > noisy, and some work is being done on the GreenDragon bot structure. > But killing everything that looks suspicious now will reduce our > ability to validate LLVM on the range of configurations that we do > today, and that, for me, is a lot worse than a few minutes' worth of > some engineers. > > >> The cost of >> buildbot noise is very high, both in terms of developer time spent, but >> more importantly due to people starting to ignore them when monitoring them >> becomes costly. > I think you're overestimating the cost. > > When I get bot emails, I click on the link and if it was timeout, I > always ignore it. If I can't make heads or tails (like the sanitizer > ones), I ignore it temporarily, then look again next day.I disagree strongly here. The cost of having flaky bots is quite high. When I make a commit, I'm committing to be responsive to problems it introduces over the next few hours. Every one of those false positives is a 5-10 minute high priority interruption to what I'm actually working on. In practice, that greatly diminishes my effectiveness. As an illustrative example, I submitted some documentation changes earlier this week and got 5 unique build failure notices. In this case, I ignored them, but if that had been a small code change, that would have cost me at least an hour of productivity.> > My assumption is that the bot owner will make me aware if the reason > is not obvious, as I do with my bots. I always wait for people to > realise, and fix. But if they can't, either because the bot was > already broken, or because the breakage isn't clear, I let people know > where to search for the information in the bot itself. This is my > responsibility as a bot owner.First, thanks for being a responsible bot owner. :) If all bot owners were doing this, having a unstable list which doesn't actively notify would be completely workable. If not all bot owners are doing this, I can't say I really care about the status of those bots.> > I appreciate the benefit of having green / red bots, but you also have > to appreciate that hardware is not perfect, and they will invariably > fail once in a while. I had some Polly bots failing randomly and it > took me only a couple of seconds to infer so. I'm not asking to remove > them, even those that fail more than pass throughout the year. I > assume that, if they're still there, it provides *some* value to > someone. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Renato Golin via llvm-dev
2015-Aug-26 16:38 UTC
[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9
On 26 August 2015 at 17:30, Philip Reames <listmail at philipreames.com> wrote:> To say this differently, we will revert a *change* which is problematic. > Why shouldn't we "revert" a bot?I don't disagree, just don't want to do that lightly. Most certainly not before we have comments from the bot owner.> As an illustrative example, I submitted some documentation changes earlier > this week and got 5 unique build failure notices. In this case, I ignored > them, but if that had been a small code change, that would have cost me at > least an hour of productivity.I have to say, I never spent more than a few minutes looking up failing bots. If there's nothing that I can find in 30 seconds of looking at the bot screen, I rely on the bot owners to ping me, revert my patches, let me know what's wrong. I'll make your words, mine:> If all bot owners were doing this, having a unstable list which doesn't > actively notify would be completely workable. If not all bot owners are > doing this, I can't say I really care about the status of those bots.:D cheers, --renato
Philip Reames via llvm-dev
2015-Aug-26 16:43 UTC
[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9
On 08/26/2015 09:38 AM, Renato Golin wrote:> On 26 August 2015 at 17:30, Philip Reames <listmail at philipreames.com> wrote: >> To say this differently, we will revert a *change* which is problematic. >> Why shouldn't we "revert" a bot? > I don't disagree, just don't want to do that lightly. Most certainly > not before we have comments from the bot owner.Why? This is not our policy for commits; why should it be different for bots? Comments within a reasonable time window (2 hours?) sure, but an unresponsive owner can simply re-enable when they get around to it. Just like the commit author can re-apply at a later time.> > >> As an illustrative example, I submitted some documentation changes earlier >> this week and got 5 unique build failure notices. In this case, I ignored >> them, but if that had been a small code change, that would have cost me at >> least an hour of productivity. > I have to say, I never spent more than a few minutes looking up > failing bots. If there's nothing that I can find in 30 seconds of > looking at the bot screen, I rely on the bot owners to ping me, revert > my patches, let me know what's wrong.The key point I was trying to make was the interruption factor. Not all of the notices come it at once. If I could batch process, it would take a lot less time. Also, there are simply some usability issues; finding the actual build error on a small screen - from a phone while in a meeting say - is rather challenging. I usually end up having to move to a laptop before being able to identify something as a false positive.> > I'll make your words, mine: > >> If all bot owners were doing this, having a unstable list which doesn't >> actively notify would be completely workable. If not all bot owners are >> doing this, I can't say I really care about the status of those bots. > :D > > cheers, > --renato
David Blaikie via llvm-dev
2015-Aug-26 16:44 UTC
[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9
On Wed, Aug 26, 2015 at 9:38 AM, Renato Golin via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 26 August 2015 at 17:30, Philip Reames <listmail at philipreames.com> > wrote: > > To say this differently, we will revert a *change* which is problematic. > > Why shouldn't we "revert" a bot? > > I don't disagree, just don't want to do that lightly. Most certainly > not before we have comments from the bot owner. > > > > As an illustrative example, I submitted some documentation changes > earlier > > this week and got 5 unique build failure notices. In this case, I > ignored > > them, but if that had been a small code change, that would have cost me > at > > least an hour of productivity. > > I have to say, I never spent more than a few minutes looking up > failing bots. If there's nothing that I can find in 30 seconds of > looking at the bot screen, I rely on the bot owners to ping me, revert > my patches, let me know what's wrong. >Which is why this is a hard thing to fix (each individual instance doesn't cost much) and why it's important: The pain is distributed temporally and geographically: each of us incurs a small amount of pain regularly, but in aggregate that's a substantial drag on the project. - David> > I'll make your words, mine: > > > If all bot owners were doing this, having a unstable list which doesn't > > actively notify would be completely workable. If not all bot owners are > > doing this, I can't say I really care about the status of those bots. > > :D > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150826/974742fd/attachment.html>
Reasonably Related Threads
- buildbot failure in LLVM on clang-native-arm-cortex-a9
- buildbot failure in LLVM on clang-native-arm-cortex-a9
- buildbot failure in LLVM on clang-native-arm-cortex-a9
- buildbot failure in LLVM on clang-native-arm-cortex-a9
- buildbot failure in LLVM on clang-native-arm-cortex-a9