Florian Hahn via llvm-dev
2021-Sep-22 09:45 UTC
[llvm-dev] False positive notifications around commit notifications
Hi Philip,> On Sep 9, 2021, at 23:18, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Flaky Builders > > ex: https://lab.llvm.org/buildbot/#/builders/68/builds/18250 <https://lab.llvm.org/buildbot/#/builders/68/builds/18250> > We have many build bots which are not entirely stable. It's gotten to the point where I *expect* failure notifications on literally every change I land. I've been trying to reach out to individual build bot owners to get issues resolved, and to their credit, most owners have been very responsive. However, we have enough builders that the situation isn't getting meaningful better. > > Recommendation: Introduce specific "test commits" whose only purpose is to run the CI infrastructure. Any builder which notifies of failure on such a commit (and only said commit) is disabled without discussion until human action is taken by the bot owner to re-enable. The idea here is to a) automate the process, and b) shift the responsibility of action to the bot owner for any flaky bot.Thanks for raising this issue! My experience matches what you are describing. The false positive rate for me is seems to be at least 10 false positives due to flakiness to 1 real failure. I think it would be good to have some sort of policy spelling out the requirements for having notification enabled for a buildbot, with a process that makes it easy to disable flaky bots until the owners can make them more stable. It would be good if notifications could be disabled without requiring contacting/interventions from individual owners, but I am not sure if that’s possible with buildbot. Cheers, Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210922/110ffcf7/attachment.html>
Martin Storsjö via llvm-dev
2021-Sep-22 09:50 UTC
[llvm-dev] False positive notifications around commit notifications
On Wed, 22 Sep 2021, Florian Hahn via llvm-dev wrote:> Thanks for raising this issue! My experience matches what you are > describing. The false positive rate for me is seems to be at least 10 false > positives due to flakiness to 1 real failure. > I think it would be good to have some sort of policy spelling out the > requirements for having notification enabled for a buildbot, with a process > that makes it easy to disable flaky bots until the owners can make them more > stable. It would be good if notifications could be disabled without > requiring contacting/interventions from individual owners, but I am not sure > if that’s possible with buildbot.Another aspect is that some tests can be flakey - they might work seemingly fine in local testing but start showing up as timeouts/spurious failures when run in a CI/buildbot setting. And due to their flakiness, it's not evident when the breakage is introduced, but over time, such flakey tests/setups do add up, to the situation we have today. // Martin
Nemanja Ivanovic via llvm-dev
2021-Oct-06 11:08 UTC
[llvm-dev] False positive notifications around commit notifications
I wonder if it would be possible to make some recommendations for improvements based on data rather than our collective anecdotal experience. Much as anyone else, I feel that the vast majority of the failure emails I get are not related, but I would have a lot of trouble quantifying it any better than a "gut feeling". Would it be possible to somehow acquire historical data from buildbots to help identify things that can improve. Perhaps: - Bot failures where none of the commits were reverted before the bot went back to green - For those failures, collect the test cases that failed - those might be flaky test cases if they show up frequently and/or on multiple bots - For bots that have many such instances (especially with different test cases every time), perhaps the bot itself is somehow flaky This is definitely an annoying problem that has significant consequences (real failures being missed due to many false failures), but it is a difficult problem to solve. On Wed, Sep 22, 2021 at 5:50 AM Martin Storsjö via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Wed, 22 Sep 2021, Florian Hahn via llvm-dev wrote: > > > Thanks for raising this issue! My experience matches what you are > > describing. The false positive rate for me is seems to be at least 10 > false > > positives due to flakiness to 1 real failure. > > I think it would be good to have some sort of policy spelling out the > > requirements for having notification enabled for a buildbot, with a > process > > that makes it easy to disable flaky bots until the owners can make them > more > > stable. It would be good if notifications could be disabled without > > requiring contacting/interventions from individual owners, but I am not > sure > > if that’s possible with buildbot. > > Another aspect is that some tests can be flakey - they might work > seemingly fine in local testing but start showing up as timeouts/spurious > failures when run in a CI/buildbot setting. And due to their flakiness, > it's not evident when the breakage is introduced, but over time, such > flakey tests/setups do add up, to the situation we have today. > > // Martin > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211006/76655098/attachment.html>
Philip Reames via llvm-dev
2021-Oct-28 20:56 UTC
[llvm-dev] False positive notifications around commit notifications
On 9/22/21 2:45 AM, Florian Hahn wrote:> Hi Philip, > >> On Sep 9, 2021, at 23:18, Philip Reames via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> *Flaky Builders* >> >> ex: https://lab.llvm.org/buildbot/#/builders/68/builds/18250 >> >> We have many build bots which are not entirely stable. It's gotten >> to the point where I *expect* failure notifications on literally >> every change I land. I've been trying to reach out to individual >> build bot owners to get issues resolved, and to their credit, most >> owners have been very responsive. However, we have enough builders >> that the situation isn't getting meaningful better. >> >> Recommendation: Introduce specific "test commits" whose only purpose >> is to run the CI infrastructure. Any builder which notifies of >> failure on such a commit (and only said commit) is disabled without >> discussion until human action is taken by the bot owner to >> re-enable. The idea here is to a) automate the process, and b) shift >> the responsibility of action to the bot owner for any flaky bot. >> > Thanks for raising this issue! My experience matches what you are > describing. The false positive rate for me is seems to be at least 10 > false positives due to flakiness to 1 real failure. > > I think it would be good to have some sort of policy spelling out the > requirements for having notification enabled for a buildbot, with a > process that makes it easy to disable flaky bots until the owners can > make them more stable. It would be good if notifications could be > disabled without requiring contacting/interventions from individual > owners, but I am not sure if that’s possible with buildbot.https://reviews.llvm.org/D112755 adds the first pieces of some documented policy around build bot expectations. It does not address the point you raise as the intent was to be a minimal documentation of existing practice, and thus hopefully be non-controversial, but assuming this moves forward, I plan to revisit this topic in its own review.> > Cheers, > Florian-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211028/85a8b55c/attachment-0001.html>