Nico Weber via llvm-dev
2020-Sep-01 20:20 UTC
[llvm-dev] [cfe-dev] Can we remove llvmbb from IRC?
On Tue, Sep 1, 2020 at 3:57 PM David Blaikie <dblaikie at gmail.com> wrote:> > > On Tue, Sep 1, 2020 at 12:42 PM Nico Weber <thakis at chromium.org> wrote: > >> On Tue, Sep 1, 2020 at 3:32 PM David Blaikie <dblaikie at gmail.com> wrote: >> >>> On Tue, Sep 1, 2020 at 12:07 PM Nico Weber via cfe-dev < >>> cfe-dev at lists.llvm.org> wrote: >>> >>>> Hi, >>>> >>>> llvmbb's job is to inform people of build breaks. However, it seems to >>>> trigger for a big list of bots, and at least one of them seems to always be >>>> broken, >>>> >>> >>> If a bot is always broken it shouldn't be sending email/notifications - >>> generally they are configured only to send email on green>red and red>green >>> transitions, so if it's already broken you shouldn't be blamed for it. If >>> you are seeing bot spam or emails from a bot that's already red, please >>> email llvm-dev and the bot maintainer and ask the bot to be reconfigured or >>> disabled. >>> >>> If a bot is regularly flakey (& thus sending email/notifications that >>> are false-positives/that no one can act on) please also send email asking >>> for the bot to be reconfigured or disabled. (or, if you want to be a bit >>> more punchy - send a patch to the zorg repository to have the bot disabled >>> & explain why you're proposing that) >>> >> >> I agree with this in the abstract, but I get pinged completely reliably >> at least twice after every single of my commits. This isn't something that >> sometimes happens, it's something that always happens. >> > > Could you point to specific buildbots/email when that comes up to help > improve things both on IRC and email/mailing lists, etc? >Just land a change :) Or look at IRC scrollback. Given how easy it is to find these problems, it doesn't seem like there's a lot of appetite for improving this. Hence me asking about removing llvmbb (...and so far everyone seems to be in favor). In this case, from my IRC scrollback (there's more people on the blamelist, spread over several follow-on IRC messages): build #13975 of clang-ppc64le-linux-multistage is complete: Failure [failed ninja check 1] Build details are at http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975 blamelist: LLVM GN Syncbot <llvmgnsyncbot at gmail.com>, Nico Weber < thakis at chromium.org> build #24132 of clang-with-thin-lto-ubuntu is complete: Failure [failed test-stage1-compiler] Build details are at http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24132 blamelist: Nico Weber <thakis at chromium.org>, Matt Arsenault < Matthew.Arsenault at amd.com>, Eric Astor <epastor at google.com>, Craig Topper < craig.topper at intel.com>, Alina build #2255 of lld-x86_64-win is complete: Failure [failed test-check-all] Build details are at http://lab.llvm.org:8011/builders/lld-x86_64-win/builds/2255 blamelist: LLVM GN Syncbot <llvmgnsyncbot at gmail.com>, Eric Astor <epastor at google.com>, Craig Topper <craig.topper at intel.com>, Alina Sbirlea <asbirlea at google.com>, Nico Weber <thakis at chromium.org>, Amara I also got email with pointers to: http://green.lab.llvm.org/green//job/clang-stage1-RA/14180/consoleFull#-1417328700a1ca8a51-895e-46c6-af87-ce24fa4cd561 Chances are that there's something genuinely broken somewhere (maybe compiler-rt?), but asking for concrete bots distracts from the point that there's something broken on every single commit, which makes the bot just let you know that you committed something in the last few hours.> > >> and the broken bots tend to have cycle times of several hours. >>>> >>> >>> Long cycle times are a real problem - that might be best left to another >>> discussion about buildbot maintenance - I would be for a policy that says >>> bot windows shouldn't be longer than, say, an hour or maybe less. (so, eg: >>> if you have a bot that's just going to take 5 hours to run - then you need >>> 5 machines that each pickup work every hour, so the blame lists are >>> smaller) this doesn't solve the problem of being notified 5 hours later >>> about a breakage that was caused by someone else who committed a few >>> minutes before or after you. Solving that problem will require a much >>> greater investment in infrastructure to chain buildbots, possibly use built >>> artefacts from one buildbot to another, etc. >>> >>> >>>> So if you're on IRC and you commit something, you get pinged by llvmbb >>>> for hours afterwards. >>>> >>>> Does anyone think llvmbb is useful? >>>> >>> >>> I sometimes find it useful, but happy to move to llvm-build to get those >>> notifications. Other folks might not know to do that, though. >>> >>> >>>> The best thing about llvmbb I've heard it's easy to just "/ignore >>>> llvmbb", but if that's what everybody does then why not not have it in the >>>> first place? >>>> >>>> Nico >>>> _______________________________________________ >>>> cfe-dev mailing list >>>> cfe-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >>>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/aaf6f75f/attachment.html>
David Blaikie via llvm-dev
2020-Sep-02 01:12 UTC
[llvm-dev] [cfe-dev] Can we remove llvmbb from IRC?
I assume you're getting emails in addition to the chat spam? Or are you not/are these bots sending chat spam but not email? If that's the case, yeah, I'd rather have a consistent notification experience - and disable all notifications from a bot if some notifications are disabled (eg: if it's not good enough to be sending email, then it shouldn't be spamming the IRC channel either) On Tue, Sep 1, 2020 at 1:20 PM Nico Weber <thakis at chromium.org> wrote:> On Tue, Sep 1, 2020 at 3:57 PM David Blaikie <dblaikie at gmail.com> wrote: > >> >> >> On Tue, Sep 1, 2020 at 12:42 PM Nico Weber <thakis at chromium.org> wrote: >> >>> On Tue, Sep 1, 2020 at 3:32 PM David Blaikie <dblaikie at gmail.com> wrote: >>> >>>> On Tue, Sep 1, 2020 at 12:07 PM Nico Weber via cfe-dev < >>>> cfe-dev at lists.llvm.org> wrote: >>>> >>>>> Hi, >>>>> >>>>> llvmbb's job is to inform people of build breaks. However, it seems to >>>>> trigger for a big list of bots, and at least one of them seems to always be >>>>> broken, >>>>> >>>> >>>> If a bot is always broken it shouldn't be sending email/notifications - >>>> generally they are configured only to send email on green>red and red>green >>>> transitions, so if it's already broken you shouldn't be blamed for it. If >>>> you are seeing bot spam or emails from a bot that's already red, please >>>> email llvm-dev and the bot maintainer and ask the bot to be reconfigured or >>>> disabled. >>>> >>>> If a bot is regularly flakey (& thus sending email/notifications that >>>> are false-positives/that no one can act on) please also send email asking >>>> for the bot to be reconfigured or disabled. (or, if you want to be a bit >>>> more punchy - send a patch to the zorg repository to have the bot disabled >>>> & explain why you're proposing that) >>>> >>> >>> I agree with this in the abstract, but I get pinged completely reliably >>> at least twice after every single of my commits. This isn't something that >>> sometimes happens, it's something that always happens. >>> >> >> Could you point to specific buildbots/email when that comes up to help >> improve things both on IRC and email/mailing lists, etc? >> > > Just land a change :) Or look at IRC scrollback. Given how easy it is to > find these problems, it doesn't seem like there's a lot of appetite for > improving this. >I think there's apetite for changing it in some way - no one enjoys the current state of things. But often people assume it's not changeable, whereas I think it is - and I think it's important that it be changed because if we silence all the bots, then quality is likely to go down. Silencing the IRC bot may still be good - folks should be getting buildbot fail email which is more targeted and not spamming the channel for people who aren't to blame (heck, the bots could send private messages instead, I guess?). But improving signal/noise should benefit the email, and the bot spam (whichever channel it's in).> Hence me asking about removing llvmbb (...and so far everyone seems to be > in favor). >> In this case, from my IRC scrollback (there's more people on the > blamelist, spread over several follow-on IRC messages): > > build #13975 of clang-ppc64le-linux-multistage is complete: Failure > [failed ninja check 1] Build details are at > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975 > blamelist: LLVM GN Syncbot <llvmgnsyncbot at gmail.com>, Nico Weber < > thakis at chromium.org> >That doesn't look like the "always be broken" case. It was green on the build prior to this one ( http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974 ) Looks like the buildbot triggered correctly, only took the 2 revisions you committed. The test did pass at the prior revision and did fail at that revision - perhaps either the buildbot or the test is flakey? (interestingly the test failed in stage 1 at 13975, then failed in stage 2 at 13976 - then passed again in 13977. Both failures for the same reason "/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage2/tools/clang/test/Driver/Output/target-override.c.script: line 5: /home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage2/tools/clang/test/Driver/Output/testbin/i386-clang: No such file or directory" - perhaps some problem with creating the symlink? Started an llvm-dev thread to discuss that separately in more detail.> build #24132 of clang-with-thin-lto-ubuntu is complete: Failure [failed > test-stage1-compiler] Build details are at > http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24132 > blamelist: Nico Weber <thakis at chromium.org>, Matt Arsenault < > Matthew.Arsenault at amd.com>, Eric Astor <epastor at google.com>, Craig Topper > <craig.topper at intel.com>, Alina >Also green on the prior build ( http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24131 ). Went green again after a revert here: http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24140 which matches the commit that made the bot go red - so this looks to be a bot doing what it's meant to do. (varying levels of quality, and 2 hour cycle time isn't ideal by any means, though it found this failure in 5 minutes once it started (but that could be 2 hours after a commit)) What do you think we should do with bots like this? Should long cycle time/long blame list bots (not always the same thing) produce no notifications, and require them to be triaged by the bot owner who then manually sends email/follow-up once a rough guess of blame has been made & checked that it hasn't already been possibly diagnosed, discussed and fixed due to a faster bot or other means?> build #2255 of lld-x86_64-win is complete: Failure [failed test-check-all] > Build details are at > http://lab.llvm.org:8011/builders/lld-x86_64-win/builds/2255 blamelist: > LLVM GN Syncbot <llvmgnsyncbot at gmail.com>, Eric Astor <epastor at google.com>, > Craig Topper <craig.topper at intel.com>, Alina Sbirlea <asbirlea at google.com>, > Nico Weber <thakis at chromium.org>, Amara >Also green on the prior build ( http://lab.llvm.org:8011/builders/lld-x86_64-win/builds/2254 ), and went back to green on the following build. Possibly this was related to the same commit/revert as in the previous bot in this list. It's a fairly fast bot, went red on a build including the revision that committed the xor issue, and green on the next build that included a revert of that patch. I couldn't say for sure, though. I also got email with pointers to:> > http://green.lab.llvm.org/green//job/clang-stage1-RA/14180/consoleFull#-1417328700a1ca8a51-895e-46c6-af87-ce24fa4cd561 >Was red for a few builds then green again here: http://green.lab.llvm.org/green/job/clang-stage1-RA/14183/ Looks like the build that went red and the build that went green (& the fact that the failure was related to libfuzzer) correlates well with this commit: https://github.com/llvm/llvm-project/commit/2665425908e00618074e42155ec922a37f7c9002 and this revert: https://github.com/llvm/llvm-project/commit/7139736261e047e9cca030e2ee5912bf2a16f816> Chances are that there's something genuinely broken somewhere (maybe > compiler-rt?), but asking for concrete bots distracts from the point that > there's something broken on every single commit, which makes the bot just > let you know that you committed something in the last few hours. >They also contain information about failures - yeah, they might not be yours, but they are often/usually someone's, not just flakey bot failures. If you're suggesting all the bots are unactionable - then perhaps we should turn off all notifications on all of them? I have certainly considered that - and then only enabling bots that are fast/high signal-to-noise/small blame list. Though I imagine that's a bigger discussion.> and the broken bots tend to have cycle times of several hours. >>>>> >>>> >>>> Long cycle times are a real problem - that might be best left to >>>> another discussion about buildbot maintenance - I would be for a policy >>>> that says bot windows shouldn't be longer than, say, an hour or maybe less. >>>> (so, eg: if you have a bot that's just going to take 5 hours to run - then >>>> you need 5 machines that each pickup work every hour, so the blame lists >>>> are smaller) this doesn't solve the problem of being notified 5 hours later >>>> about a breakage that was caused by someone else who committed a few >>>> minutes before or after you. Solving that problem will require a much >>>> greater investment in infrastructure to chain buildbots, possibly use built >>>> artefacts from one buildbot to another, etc. >>>> >>>> >>>>> So if you're on IRC and you commit something, you get pinged by llvmbb >>>>> for hours afterwards. >>>>> >>>>> Does anyone think llvmbb is useful? >>>>> >>>> >>>> I sometimes find it useful, but happy to move to llvm-build to get >>>> those notifications. Other folks might not know to do that, though. >>>> >>>> >>>>> The best thing about llvmbb I've heard it's easy to just "/ignore >>>>> llvmbb", but if that's what everybody does then why not not have it in the >>>>> first place? >>>>> >>>>> Nico >>>>> _______________________________________________ >>>>> cfe-dev mailing list >>>>> cfe-dev at lists.llvm.org >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >>>>> >>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/1dc15890/attachment.html>
Nico Weber via llvm-dev
2020-Sep-02 13:27 UTC
[llvm-dev] [cfe-dev] Can we remove llvmbb from IRC?
On Tue, Sep 1, 2020 at 9:13 PM David Blaikie <dblaikie at gmail.com> wrote:> I assume you're getting emails in addition to the chat spam? Or are you > not/are these bots sending chat spam but not email? If that's the case, > yeah, I'd rather have a consistent notification experience - and disable > all notifications from a bot if some notifications are disabled (eg: if > it's not good enough to be sending email, then it shouldn't be spamming the > IRC channel either) >I received a single email for the greendragon bot. The rest was IRC only. (The greendragon bot didn't send an IRC ping I think.)> > On Tue, Sep 1, 2020 at 1:20 PM Nico Weber <thakis at chromium.org> wrote: > >> On Tue, Sep 1, 2020 at 3:57 PM David Blaikie <dblaikie at gmail.com> wrote: >> >>> >>> >>> On Tue, Sep 1, 2020 at 12:42 PM Nico Weber <thakis at chromium.org> wrote: >>> >>>> On Tue, Sep 1, 2020 at 3:32 PM David Blaikie <dblaikie at gmail.com> >>>> wrote: >>>> >>>>> On Tue, Sep 1, 2020 at 12:07 PM Nico Weber via cfe-dev < >>>>> cfe-dev at lists.llvm.org> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> llvmbb's job is to inform people of build breaks. However, it seems >>>>>> to trigger for a big list of bots, and at least one of them seems to always >>>>>> be broken, >>>>>> >>>>> >>>>> If a bot is always broken it shouldn't be sending email/notifications >>>>> - generally they are configured only to send email on green>red and >>>>> red>green transitions, so if it's already broken you shouldn't be blamed >>>>> for it. If you are seeing bot spam or emails from a bot that's already red, >>>>> please email llvm-dev and the bot maintainer and ask the bot to be >>>>> reconfigured or disabled. >>>>> >>>>> If a bot is regularly flakey (& thus sending email/notifications that >>>>> are false-positives/that no one can act on) please also send email asking >>>>> for the bot to be reconfigured or disabled. (or, if you want to be a bit >>>>> more punchy - send a patch to the zorg repository to have the bot disabled >>>>> & explain why you're proposing that) >>>>> >>>> >>>> I agree with this in the abstract, but I get pinged completely reliably >>>> at least twice after every single of my commits. This isn't something that >>>> sometimes happens, it's something that always happens. >>>> >>> >>> Could you point to specific buildbots/email when that comes up to help >>> improve things both on IRC and email/mailing lists, etc? >>> >> >> Just land a change :) Or look at IRC scrollback. Given how easy it is to >> find these problems, it doesn't seem like there's a lot of appetite for >> improving this. >> > > I think there's apetite for changing it in some way - no one enjoys the > current state of things. But often people assume it's not changeable, > whereas I think it is - and I think it's important that it be changed > because if we silence all the bots, then quality is likely to go down. > Silencing the IRC bot may still be good - folks should be getting buildbot > fail email which is more targeted and not spamming the channel for people > who aren't to blame (heck, the bots could send private messages instead, I > guess?). > > But improving signal/noise should benefit the email, and the bot spam > (whichever channel it's in). > > >> Hence me asking about removing llvmbb (...and so far everyone seems to be >> in favor). >> > >> In this case, from my IRC scrollback (there's more people on the >> blamelist, spread over several follow-on IRC messages): >> >> build #13975 of clang-ppc64le-linux-multistage is complete: Failure >> [failed ninja check 1] Build details are at >> http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975 >> blamelist: LLVM GN Syncbot <llvmgnsyncbot at gmail.com>, Nico Weber < >> thakis at chromium.org> >> > > That doesn't look like the "always be broken" case. It was green on the > build prior to this one ( > http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974 > ) > > Looks like the buildbot triggered correctly, only took the 2 revisions you > committed. The test did pass at the prior revision and did fail at that > revision - perhaps either the buildbot or the test is flakey? > (interestingly the test failed in stage 1 at 13975, then failed in stage 2 > at 13976 - then passed again in 13977. Both failures for the same reason > "/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage2/tools/clang/test/Driver/Output/target-override.c.script: > line 5: > /home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage2/tools/clang/test/Driver/Output/testbin/i386-clang: > No such file or directory" - perhaps some problem with creating the symlink? > > Started an llvm-dev thread to discuss that separately in more detail. > > >> build #24132 of clang-with-thin-lto-ubuntu is complete: Failure [failed >> test-stage1-compiler] Build details are at >> http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24132 >> blamelist: Nico Weber <thakis at chromium.org>, Matt Arsenault < >> Matthew.Arsenault at amd.com>, Eric Astor <epastor at google.com>, Craig >> Topper <craig.topper at intel.com>, Alina >> > > Also green on the prior build ( > http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24131 > ). > Went green again after a revert here: > http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24140 which > matches the commit that made the bot go red - so this looks to be a bot > doing what it's meant to do. (varying levels of quality, and 2 hour cycle > time isn't ideal by any means, though it found this failure in 5 minutes > once it started (but that could be 2 hours after a commit)) > > What do you think we should do with bots like this? Should long cycle > time/long blame list bots (not always the same thing) produce no > notifications, and require them to be triaged by the bot owner who then > manually sends email/follow-up once a rough guess of blame has been made & > checked that it hasn't already been possibly diagnosed, discussed and fixed > due to a faster bot or other means? >My personal opinion is that we shouldn't have any bots that take more than an hour to cycle send any notifications.> > >> build #2255 of lld-x86_64-win is complete: Failure [failed >> test-check-all] Build details are at >> http://lab.llvm.org:8011/builders/lld-x86_64-win/builds/2255 blamelist: >> LLVM GN Syncbot <llvmgnsyncbot at gmail.com>, Eric Astor <epastor at google.com>, >> Craig Topper <craig.topper at intel.com>, Alina Sbirlea <asbirlea at google.com>, >> Nico Weber <thakis at chromium.org>, Amara >> > > Also green on the prior build ( > http://lab.llvm.org:8011/builders/lld-x86_64-win/builds/2254 ), and went > back to green on the following build. > Possibly this was related to the same commit/revert as in the previous bot > in this list. It's a fairly fast bot, went red on a build including the > revision that committed the xor issue, and green on the next build that > included a revert of that patch. I couldn't say for sure, though. > > I also got email with pointers to: >> >> http://green.lab.llvm.org/green//job/clang-stage1-RA/14180/consoleFull#-1417328700a1ca8a51-895e-46c6-af87-ce24fa4cd561 >> > > Was red for a few builds then green again here: > http://green.lab.llvm.org/green/job/clang-stage1-RA/14183/ > > Looks like the build that went red and the build that went green (& the > fact that the failure was related to libfuzzer) correlates well with this > commit: > https://github.com/llvm/llvm-project/commit/2665425908e00618074e42155ec922a37f7c9002 and > this revert: > https://github.com/llvm/llvm-project/commit/7139736261e047e9cca030e2ee5912bf2a16f816 > > >> Chances are that there's something genuinely broken somewhere (maybe >> compiler-rt?), but asking for concrete bots distracts from the point that >> there's something broken on every single commit, which makes the bot just >> let you know that you committed something in the last few hours. >> > > They also contain information about failures - yeah, they might not be > yours, but they are often/usually someone's, not just flakey bot failures. > If you're suggesting all the bots are unactionable - then perhaps we should > turn off all notifications on all of them? I have certainly considered that > - and then only enabling bots that are fast/high signal-to-noise/small > blame list. Though I imagine that's a bigger discussion. > > >> and the broken bots tend to have cycle times of several hours. >>>>>> >>>>> >>>>> Long cycle times are a real problem - that might be best left to >>>>> another discussion about buildbot maintenance - I would be for a policy >>>>> that says bot windows shouldn't be longer than, say, an hour or maybe less. >>>>> (so, eg: if you have a bot that's just going to take 5 hours to run - then >>>>> you need 5 machines that each pickup work every hour, so the blame lists >>>>> are smaller) this doesn't solve the problem of being notified 5 hours later >>>>> about a breakage that was caused by someone else who committed a few >>>>> minutes before or after you. Solving that problem will require a much >>>>> greater investment in infrastructure to chain buildbots, possibly use built >>>>> artefacts from one buildbot to another, etc. >>>>> >>>>> >>>>>> So if you're on IRC and you commit something, you get pinged by >>>>>> llvmbb for hours afterwards. >>>>>> >>>>>> Does anyone think llvmbb is useful? >>>>>> >>>>> >>>>> I sometimes find it useful, but happy to move to llvm-build to get >>>>> those notifications. Other folks might not know to do that, though. >>>>> >>>>> >>>>>> The best thing about llvmbb I've heard it's easy to just "/ignore >>>>>> llvmbb", but if that's what everybody does then why not not have it in the >>>>>> first place? >>>>>> >>>>>> Nico >>>>>> _______________________________________________ >>>>>> cfe-dev mailing list >>>>>> cfe-dev at lists.llvm.org >>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >>>>>> >>>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200902/3923cec6/attachment.html>