thr3ads.net - llvm dev - [llvm-dev] False positive notifications around commit notifications [Oct 2021]

If this information is useful, please help other people find it:
Share via:

Florian Hahn via llvm-dev

2021-Sep-22 09:45 UTC

[llvm-dev] False positive notifications around commit notifications

Hi Philip,
> On Sep 9, 2021, at 23:18, Philip Reames via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> Flaky Builders
> 
> ex: https://lab.llvm.org/buildbot/#/builders/68/builds/18250
<https://lab.llvm.org/buildbot/#/builders/68/builds/18250>
> We have many build bots which are not entirely stable.  It's gotten to
the point where I *expect* failure notifications on literally every change I
land.  I've been trying to reach out to individual build bot owners to get
issues resolved, and to their credit, most owners have been very responsive. 
However, we have enough builders that the situation isn't getting meaningful
better.
> 
> Recommendation: Introduce specific "test commits" whose only
purpose is to run the CI infrastructure.  Any builder which notifies of failure
on such a commit (and only said commit) is disabled without discussion until
human action is taken by the bot owner to re-enable.  The idea here is to a)
automate the process, and b) shift the responsibility of action to the bot owner
for any flaky bot.Thanks for raising this issue! My experience matches what you are describing.
The false positive rate for me is seems to be at least 10 false positives due to
flakiness to 1 real failure.

I think it would be good to have some sort of policy spelling out the
requirements for having notification enabled for a buildbot, with a process that
makes it easy to disable flaky bots until the owners can make them more stable.
It would be good if notifications could be disabled without requiring
contacting/interventions from individual owners, but I am not sure if that’s
possible with buildbot.

Cheers,
Florian
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210922/110ffcf7/attachment.html>

Martin Storsjö via llvm-dev

2021-Sep-22 09:50 UTC

head link

[llvm-dev] False positive notifications around commit notifications

On Wed, 22 Sep 2021, Florian Hahn via llvm-dev wrote:
> Thanks for raising this issue! My experience matches what you are
> describing. The false positive rate for me is seems to be at least 10 false
> positives due to flakiness to 1 real failure. 
> I think it would be good to have some sort of policy spelling out the
> requirements for having notification enabled for a buildbot, with a process
> that makes it easy to disable flaky bots until the owners can make them
more
> stable. It would be good if notifications could be disabled without
> requiring contacting/interventions from individual owners, but I am not
sure
> if that’s possible with buildbot.
Another aspect is that some tests can be flakey - they might work 
seemingly fine in local testing but start showing up as timeouts/spurious 
failures when run in a CI/buildbot setting. And due to their flakiness, 
it's not evident when the breakage is introduced, but over time, such 
flakey tests/setups do add up, to the situation we have today.

// Martin

Nemanja Ivanovic via llvm-dev

2021-Oct-06 11:08 UTC

head link

[llvm-dev] False positive notifications around commit notifications

I wonder if it would be possible to make some recommendations for
improvements based on data rather than our collective anecdotal experience.
Much as anyone else, I feel that the vast majority of the failure emails I
get are not related, but I would have a lot of trouble quantifying it any
better than a "gut feeling".

Would it be possible to somehow acquire historical data from buildbots to
help identify things that can improve. Perhaps:
- Bot failures where none of the commits were reverted before the bot went
back to green
- For those failures, collect the test cases that failed - those might be
flaky test cases if they show up frequently and/or on multiple bots
- For bots that have many such instances (especially with different test
cases every time), perhaps the bot itself is somehow flaky

This is definitely an annoying problem that has significant consequences
(real failures being missed due to many false failures), but it is a
difficult problem to solve.

On Wed, Sep 22, 2021 at 5:50 AM Martin Storsjö via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Wed, 22 Sep 2021, Florian Hahn via llvm-dev wrote:
>
> > Thanks for raising this issue! My experience matches what you are
> > describing. The false positive rate for me is seems to be at least 10
> false
> > positives due to flakiness to 1 real failure.
> > I think it would be good to have some sort of policy spelling out the
> > requirements for having notification enabled for a buildbot, with a
> process
> > that makes it easy to disable flaky bots until the owners can make
them
> more
> > stable. It would be good if notifications could be disabled without
> > requiring contacting/interventions from individual owners, but I am
not
> sure
> > if that’s possible with buildbot.
>
> Another aspect is that some tests can be flakey - they might work
> seemingly fine in local testing but start showing up as timeouts/spurious
> failures when run in a CI/buildbot setting. And due to their flakiness,
> it's not evident when the breakage is introduced, but over time, such
> flakey tests/setups do add up, to the situation we have today.
>
> // Martin
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211006/76655098/attachment.html>

Philip Reames via llvm-dev

2021-Oct-28 20:56 UTC

head link

[llvm-dev] False positive notifications around commit notifications

On 9/22/21 2:45 AM, Florian Hahn wrote:> Hi Philip,
>
>> On Sep 9, 2021, at 23:18, Philip Reames via llvm-dev 
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>> *Flaky Builders*
>>
>> ex: https://lab.llvm.org/buildbot/#/builders/68/builds/18250
>>
>> We have many build bots which are not entirely stable.  It's gotten
>> to the point where I *expect* failure notifications on literally 
>> every change I land. I've been trying to reach out to individual 
>> build bot owners to get issues resolved, and to their credit, most 
>> owners have been very responsive.  However, we have enough builders 
>> that the situation isn't getting meaningful better.
>>
>> Recommendation: Introduce specific "test commits" whose only
purpose
>> is to run the CI infrastructure.  Any builder which notifies of 
>> failure on such a commit (and only said commit) is disabled without 
>> discussion until human action is taken by the bot owner to 
>> re-enable.  The idea here is to a) automate the process, and b) shift 
>> the responsibility of action to the bot owner for any flaky bot.
>>
> Thanks for raising this issue! My experience matches what you are 
> describing. The false positive rate for me is seems to be at least 10 
> false positives due to flakiness to 1 real failure.
>
> I think it would be good to have some sort of policy spelling out the 
> requirements for having notification enabled for a buildbot, with a 
> process that makes it easy to disable flaky bots until the owners can 
> make them more stable. It would be good if notifications could be 
> disabled without requiring contacting/interventions from individual 
> owners, but I am not sure if that’s possible with buildbot.https://reviews.llvm.org/D112755 adds the first pieces of some 
documented policy around build bot expectations.  It does not address 
the point you raise as the intent was to be a minimal documentation of 
existing practice, and thus hopefully be non-controversial, but assuming 
this moves forward, I plan to revisit this topic in its own
review.>
> Cheers,
> Florian-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211028/85a8b55c/attachment-0001.html>

llvm dev - Oct 2021 - False positive notifications around commit notifications

[llvm-dev] False positive notifications around commit notifications

[llvm-dev] False positive notifications around commit notifications

[llvm-dev] False positive notifications around commit notifications

[llvm-dev] False positive notifications around commit notifications