Michael Kruse via llvm-dev
2021-Oct-11 19:06 UTC
[llvm-dev] False positive notifications around commit notifications
Am Mo., 11. Okt. 2021 um 12:57 Uhr schrieb David Blaikie via llvm-dev <llvm-dev at lists.llvm.org>:> Here's a fun one: https://lab.llvm.org/buildbot/#/builders/164/builds/3428 - a buildbot failure with a single blame (me) - but I hadn't committed in the last few days, so I was confused. Turns out its from a change committed 3 months ago - and the failure is a timeout. > > Given the number of buildbot timeout false positives, I honestly wouldn't be averse to saying timeouts shouldn't produce fail-mail & are the responsibility of buildbot owners to triage. I realize we can actually submit code that leads to timeouts, but on balance that seems rare compared to the number of times its a buildbot configuration issue instead. (though open to debate on that for sure)Wow, that bot does not collapse buildrequests and is indeed 3 months behind due to not being fast enough to keep up with LLVM's commit rate. Even if the bot was reliable, getting notified 3 months later isn't useful.>From the wildly varying duration the test step takes (5 - 33 minutes;not the build step, it is doing incremental builds), I assume that the worker is running other things in parallel, maybe another worker, such that the buildjob sometimes is starving and causing the timeout. IMHO buildbots should not run other heavy jobs in parallel. Michael
Chris Lattner via llvm-dev
2021-Oct-12 16:53 UTC
[llvm-dev] False positive notifications around commit notifications
> On Oct 11, 2021, at 12:06 PM, Michael Kruse via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Am Mo., 11. Okt. 2021 um 12:57 Uhr schrieb David Blaikie via llvm-dev > <llvm-dev at lists.llvm.org>: >> Here's a fun one: https://lab.llvm.org/buildbot/#/builders/164/builds/3428 - a buildbot failure with a single blame (me) - but I hadn't committed in the last few days, so I was confused. Turns out its from a change committed 3 months ago - and the failure is a timeout. >> >> Given the number of buildbot timeout false positives, I honestly wouldn't be averse to saying timeouts shouldn't produce fail-mail & are the responsibility of buildbot owners to triage. I realize we can actually submit code that leads to timeouts, but on balance that seems rare compared to the number of times its a buildbot configuration issue instead. (though open to debate on that for sure) > > Wow, that bot does not collapse buildrequests and is indeed 3 months > behind due to not being fast enough to keep up with LLVM's commit > rate. Even if the bot was reliable, getting notified 3 months later > isn't useful. > From the wildly varying duration the test step takes (5 - 33 minutes; > not the build step, it is doing incremental builds), I assume that the > worker is running other things in parallel, maybe another worker, such > that the buildjob sometimes is starving and causing the timeout. IMHO > buildbots should not run other heavy jobs in parallel.I agree with David re: timeouts should only go to the owner of the bot. Separately, is the arc-builder builder actually useful? Should we remove it? -Chris