Zachary Turner via llvm-dev
2019-Feb-19 19:21 UTC
[llvm-dev] Clarification on expectations of buildbot email notifications
Hi all, Over the past year or so, all of us have broken the buildbots on many occasions. Usually we get notified on IRC, or via an buildbot email notification sent to everyone on the blamelist. If I happen to be on IRC I'll see the notification, but if not, the next best thing is an email that was automatically sent to me (along with everyone else on the blamelist) from the buildbot with information about the failure. And then finally, I'll occasionally get a response to my commit message telling me that it's broken, and the patch may be reverted with information in the commit message explaining which bot was broken and providing a link to it. However, we have some buildbots on the public waterfall which are specifically configured not to send emails to people. In some cases it's because the bots are experimental, but there are a handful where the reasoning I've been given is that it "wastes peoples time and contributes to build blindness", but we are still expected to keep them green (usually by people manually reaching out to us when they fail, or patches getting reverted and us getting notified of the revert). It is this last case that I'm concerned about, as it appears to be in direct conflict with our own developer policy [ https://llvm.org/docs/DeveloperPolicy.html#id14], which states this ----- We prefer for this to be handled before submission but understand that it isn’t possible to test all of this for every submission. Our build bots and nightly testing infrastructure normally finds these problems. A good rule of thumb is to check the nightly testers for regressions the day after your change. Build bots will directly email you if a group of commits that included yours caused a failure. You are expected to check the build bot messages to see if they are your fault and, if so, fix the breakage. Commits that violate these quality standards (e.g. are very broken) may be reverted. This is necessary when the change blocks other developers from making progress. The developer is welcome to re-commit the change after the problem has been fixed. ----- I'm sending this email to get a sense of the community's views on this matter. If I'm correctly reading between the lines in the above passage, buildbots which do not send emails should not be subject to the revert-to-green policy. To be honest, it's actually not even clear from reading the above passage where the burden of fixing a "broken" patch on a silent buildbot lies at all - with the patch author or with the bot maintainer. Would anyone care to weigh in with an unbiased opinion here? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190219/9a44ce36/attachment.html>
Tom Stellard via llvm-dev
2019-Feb-19 20:02 UTC
[llvm-dev] Clarification on expectations of buildbot email notifications
On 02/19/2019 11:21 AM, Zachary Turner via llvm-dev wrote:> Hi all, > > Over the past year or so, all of us have broken the buildbots on many occasions. Usually we get notified on IRC, or via an buildbot email notification sent to everyone on the blamelist. > If I happen to be on IRC I'll see the notification, but if not, the next best thing is an email that was automatically sent to me (along with everyone else on the blamelist) from the buildbot with information about the failure. > And then finally, I'll occasionally get a response to my commit message telling me that it's broken, and the patch may be reverted with information in the commit message explaining which bot was broken and providing a link to it. > > However, we have some buildbots on the public waterfall which are specifically configured not to send emails to people. In some cases it's because the bots are experimental, but there are a handful where the reasoning I've been given is that it "wastes peoples time and contributes to build blindness", but we are still expected to keep them green (usually by people manually reaching out to us when they fail, or patches getting reverted and us getting notified of the revert). > > It is this last case that I'm concerned about, as it appears to be in direct conflict with our own developer policy [https://llvm.org/docs/DeveloperPolicy.html#id14], which states this > ----- > We prefer for this to be handled before submission but understand that it isn’t possible to test all of this for every submission. Our build bots and nightly testing infrastructure normally finds these problems. A good rule of thumb is to check the nightly testers for regressions the day after your change. Build bots will directly email you if a group of commits that included yours caused a failure. You are expected to check the build bot messages to see if they are your fault and, if so, fix the breakage. > > Commits that violate these quality standards (e.g. are very broken) may be reverted. This is necessary when the change blocks other developers from making progress. The developer is welcome to re-commit the change after the problem has been fixed. > > ----- > > I'm sending this email to get a sense of the community's views on this matter. If I'm correctly reading between the lines in the above passage, buildbots which do not send emails should not be subject to the revert-to-green policy. To be honest, it's actually not even clear from reading the above passage where the burden of fixing a "broken" patch on a silent buildbot lies at all - with the patch author or with the bot maintainer. > > > Would anyone care to weigh in with an unbiased opinion here? >I think for any regressions whether they affect buildbots or not, the patch author should be responsible for fixing the issue. In my experience, this is also usually what happens when I report a regression. I think the responsibility of the regression reporter (which could be a buildbot) is to provide a reasonably prompt notification of failure along with clear instructions for reproducing the issue. If that criteria is met, then I think it is always fair to ask for a revert if the issue can't be fixed in a few days. Even without a prompt notification, though, I still think reverts are appropriate in some cases and that the patch author should take the lead in fixing the issue. The buildbots that automatically send notifications,though, are a little bit of a special case, because when they are broken it affects everybody. I think for those bots, having an immediate revert to green policy, like we do now is fine. I don't think it's reasonable to expect developers to monitor nightly testers, so maybe this part of the developer policy should be changed. There is almost always at least one one bot that is red. -Tom> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Reid Kleckner via llvm-dev
2019-Feb-19 23:29 UTC
[llvm-dev] Clarification on expectations of buildbot email notifications
I don't think whether a buildbot sends email should have anything to do with whether we revert to green or not. Very often, developers commit patches that cause regressions not caught by our buildbots. If the regression is severe enough, then I think community members have the right, and perhaps responsibility, to revert the change that caused it. Our team maintains bots that build chrome with trunk versions of clang, and we identify many regressions this way and end up doing many reverts as a result. I think it's important to continue this practice so that we don't let multiple regressions pile up. I think what's important, and what we should, after this discussion concludes, put in the developer policy, is that the person doing the revert has the responsibility to do their best to help the patch author reproduce the problem or at least understand the bug. This can take many forms. They can link directly to an LLVM buildbot, which should be self-explanatory as far as reproduction goes. It can be an unreduced crash report. If they're nice, they can use CReduce to make it smaller. But, a reverter can't just say "Revert rNNN, breaks $RANDOM_PROJECT on x86_64-linux-gu". If they add, "reduction forthcoming" and they deliver on that promise, I think we should support that. In other words, the bar to revert should be low, so we can do it fast and save downstream consumers time and effort. If someone isn't making a good faith effort to follow up after a revert, then authors have a right to push back. I agree with Paul that we should remove the text about checking nightly builders. That suggestion seems a bit dated. On Tue, Feb 19, 2019 at 11:22 AM Zachary Turner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi all, > > Over the past year or so, all of us have broken the buildbots on many > occasions. Usually we get notified on IRC, or via an buildbot email > notification sent to everyone on the blamelist. > If I happen to be on IRC I'll see the notification, but if not, the next > best thing is an email that was automatically sent to me (along with > everyone else on the blamelist) from the buildbot with information about > the failure. > And then finally, I'll occasionally get a response to my commit message > telling me that it's broken, and the patch may be reverted with information > in the commit message explaining which bot was broken and providing a link > to it. > > However, we have some buildbots on the public waterfall which are > specifically configured not to send emails to people. In some cases it's > because the bots are experimental, but there are a handful where the > reasoning I've been given is that it "wastes peoples time and contributes > to build blindness", but we are still expected to keep them green (usually > by people manually reaching out to us when they fail, or patches getting > reverted and us getting notified of the revert). > > It is this last case that I'm concerned about, as it appears to be in > direct conflict with our own developer policy [ > https://llvm.org/docs/DeveloperPolicy.html#id14], which states this > ----- > We prefer for this to be handled before submission but understand that it > isn’t possible to test all of this for every submission. Our build bots and > nightly testing infrastructure normally finds these problems. A good rule > of thumb is to check the nightly testers for regressions the day after your > change. Build bots will directly email you if a group of commits that > included yours caused a failure. You are expected to check the build bot > messages to see if they are your fault and, if so, fix the breakage. > > Commits that violate these quality standards (e.g. are very broken) may be > reverted. This is necessary when the change blocks other developers from > making progress. The developer is welcome to re-commit the change after the > problem has been fixed. > > ----- > > I'm sending this email to get a sense of the community's views on this > matter. If I'm correctly reading between the lines in the above passage, > buildbots which do not send emails should not be subject to the > revert-to-green policy. To be honest, it's actually not even clear from > reading the above passage where the burden of fixing a "broken" patch on a > silent buildbot lies at all - with the patch author or with the bot > maintainer. > > > Would anyone care to weigh in with an unbiased opinion here? > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190219/3f1299bd/attachment.html>
Sjoerd Meijer via llvm-dev
2019-Feb-20 09:32 UTC
[llvm-dev] Clarification on expectations of buildbot email notifications
I think we could/should be a little bit more precise here:> ... any regressions whether they affect buildbots or not, the > patch author should be responsible for fixing the issue.especially if we say that the bar for a revert is low. That is, the "any regression" needs a bit more clarifications. Assuming we are talking about performance regressions (not language conformance or correctness): 1) We sometimes see regressions where code generation is (almost) the same, but the code layout is different. Some micro-architectures are more sensitive to this than others, causing significant regressions. We always thought it was unfair to ask for a revert for these kind of regressions, and thus never ask for that. 2) We also sometimes see that patches that cause regressions actually do the right thing, but have all sorts of knock on effects e.g. causing different codegen and regressions. Sometimes this is just unlucky (e.g. regalloc making different decisions), but sometimes other passes can't handle the IR or machine code less efficient and something need to be actually fixed. But we also very rarely ask for a revert in these cases. 3) The obvious and straightforward case is when a patch is not doing the right thing or e.g. forgets certain cases. Usually what we do is leave a comment on the Phab ticket, and when the author responds fast and works on a fix we can live with the regression for a few days (but it looks like we could be a bit more aggressive with reverts if we wanted to). The straightforward cases are 1) and 3), where the former is not worth a revert (but it would be good to be explicit about this), and 3) is definitely worth a revert. 2) is the tricky one, because it has a lot of grey areas. I guess the reason why we are not very aggressive with reverts is that we don't want to stop others from making progress, and also thought that in some cases it was just our problem and not the author's. In the example of knock on effects and some heuristic making a different/wrong decision, I thought it was unfair to the author to ask for a revert. A more aggressive revert policy here could easily lead to people not making any progress or a lot less fast. Cheers, Sjoerd. ________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Reid Kleckner via llvm-dev <llvm-dev at lists.llvm.org> Sent: 19 February 2019 23:29 To: Zachary Turner Cc: llvm-dev Subject: Re: [llvm-dev] Clarification on expectations of buildbot email notifications I don't think whether a buildbot sends email should have anything to do with whether we revert to green or not. Very often, developers commit patches that cause regressions not caught by our buildbots. If the regression is severe enough, then I think community members have the right, and perhaps responsibility, to revert the change that caused it. Our team maintains bots that build chrome with trunk versions of clang, and we identify many regressions this way and end up doing many reverts as a result. I think it's important to continue this practice so that we don't let multiple regressions pile up. I think what's important, and what we should, after this discussion concludes, put in the developer policy, is that the person doing the revert has the responsibility to do their best to help the patch author reproduce the problem or at least understand the bug. This can take many forms. They can link directly to an LLVM buildbot, which should be self-explanatory as far as reproduction goes. It can be an unreduced crash report. If they're nice, they can use CReduce to make it smaller. But, a reverter can't just say "Revert rNNN, breaks $RANDOM_PROJECT on x86_64-linux-gu". If they add, "reduction forthcoming" and they deliver on that promise, I think we should support that. In other words, the bar to revert should be low, so we can do it fast and save downstream consumers time and effort. If someone isn't making a good faith effort to follow up after a revert, then authors have a right to push back. I agree with Paul that we should remove the text about checking nightly builders. That suggestion seems a bit dated. On Tue, Feb 19, 2019 at 11:22 AM Zachary Turner via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi all, Over the past year or so, all of us have broken the buildbots on many occasions. Usually we get notified on IRC, or via an buildbot email notification sent to everyone on the blamelist. If I happen to be on IRC I'll see the notification, but if not, the next best thing is an email that was automatically sent to me (along with everyone else on the blamelist) from the buildbot with information about the failure. And then finally, I'll occasionally get a response to my commit message telling me that it's broken, and the patch may be reverted with information in the commit message explaining which bot was broken and providing a link to it. However, we have some buildbots on the public waterfall which are specifically configured not to send emails to people. In some cases it's because the bots are experimental, but there are a handful where the reasoning I've been given is that it "wastes peoples time and contributes to build blindness", but we are still expected to keep them green (usually by people manually reaching out to us when they fail, or patches getting reverted and us getting notified of the revert). It is this last case that I'm concerned about, as it appears to be in direct conflict with our own developer policy [https://llvm.org/docs/DeveloperPolicy.html#id14], which states this ----- We prefer for this to be handled before submission but understand that it isn’t possible to test all of this for every submission. Our build bots and nightly testing infrastructure normally finds these problems. A good rule of thumb is to check the nightly testers for regressions the day after your change. Build bots will directly email you if a group of commits that included yours caused a failure. You are expected to check the build bot messages to see if they are your fault and, if so, fix the breakage. Commits that violate these quality standards (e.g. are very broken) may be reverted. This is necessary when the change blocks other developers from making progress. The developer is welcome to re-commit the change after the problem has been fixed. ----- I'm sending this email to get a sense of the community's views on this matter. If I'm correctly reading between the lines in the above passage, buildbots which do not send emails should not be subject to the revert-to-green policy. To be honest, it's actually not even clear from reading the above passage where the burden of fixing a "broken" patch on a silent buildbot lies at all - with the patch author or with the bot maintainer. Would anyone care to weigh in with an unbiased opinion here? _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190220/0dc0d962/attachment.html>
Chandler Carruth via llvm-dev
2019-Feb-20 10:11 UTC
[llvm-dev] Clarification on expectations of buildbot email notifications
On Tue, Feb 19, 2019 at 1:29 PM Reid Kleckner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I don't think whether a buildbot sends email should have anything to do > with whether we revert to green or not. Very often, developers commit > patches that cause regressions not caught by our buildbots. If the > regression is severe enough, then I think community members have the right, > and perhaps responsibility, to revert the change that caused it. Our team > maintains bots that build chrome with trunk versions of clang, and we > identify many regressions this way and end up doing many reverts as a > result. I think it's important to continue this practice so that we don't > let multiple regressions pile up. > > I think what's important, and what we should, after this discussion > concludes, put in the developer policy, is that the person doing the revert > has the responsibility to do their best to help the patch author reproduce > the problem or at least understand the bug. > > This can take many forms. They can link directly to an LLVM buildbot, > which should be self-explanatory as far as reproduction goes. It can be an > unreduced crash report. If they're nice, they can use CReduce to make it > smaller. But, a reverter can't just say "Revert rNNN, breaks > $RANDOM_PROJECT on x86_64-linux-gu". If they add, "reduction forthcoming" > and they deliver on that promise, I think we should support that. > > In other words, the bar to revert should be low, so we can do it fast and > save downstream consumers time and effort. If someone isn't making a good > faith effort to follow up after a revert, then authors have a right to push > back. >I really strongly endorse this approach. This, IMO, is the crux of revert-to-green: somewhat regardless of the source of green vs. red, we need to revert quickly and with relatively low bar. The result of a revert is a shared obligation between reverter and author to find a path forward and Reid nicely outlines how the reverter can address their end of the bargain. I want to emphasize that "quickly" here often (but definitely not always) needs to be much shorter than "a few days" or even "a day" due to the rate of incoming patches and the need to minimize compound failures hiding precise regression signal. Anyways, +1 =] -Chandler> > I agree with Paul that we should remove the text about checking nightly > builders. That suggestion seems a bit dated. > > On Tue, Feb 19, 2019 at 11:22 AM Zachary Turner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi all, >> >> Over the past year or so, all of us have broken the buildbots on many >> occasions. Usually we get notified on IRC, or via an buildbot email >> notification sent to everyone on the blamelist. >> If I happen to be on IRC I'll see the notification, but if not, the next >> best thing is an email that was automatically sent to me (along with >> everyone else on the blamelist) from the buildbot with information about >> the failure. >> And then finally, I'll occasionally get a response to my commit message >> telling me that it's broken, and the patch may be reverted with information >> in the commit message explaining which bot was broken and providing a link >> to it. >> >> However, we have some buildbots on the public waterfall which are >> specifically configured not to send emails to people. In some cases it's >> because the bots are experimental, but there are a handful where the >> reasoning I've been given is that it "wastes peoples time and contributes >> to build blindness", but we are still expected to keep them green (usually >> by people manually reaching out to us when they fail, or patches getting >> reverted and us getting notified of the revert). >> >> It is this last case that I'm concerned about, as it appears to be in >> direct conflict with our own developer policy [ >> https://llvm.org/docs/DeveloperPolicy.html#id14], which states this >> ----- >> We prefer for this to be handled before submission but understand that it >> isn’t possible to test all of this for every submission. Our build bots and >> nightly testing infrastructure normally finds these problems. A good rule >> of thumb is to check the nightly testers for regressions the day after your >> change. Build bots will directly email you if a group of commits that >> included yours caused a failure. You are expected to check the build bot >> messages to see if they are your fault and, if so, fix the breakage. >> >> Commits that violate these quality standards (e.g. are very broken) may >> be reverted. This is necessary when the change blocks other developers from >> making progress. The developer is welcome to re-commit the change after the >> problem has been fixed. >> >> ----- >> >> I'm sending this email to get a sense of the community's views on this >> matter. If I'm correctly reading between the lines in the above passage, >> buildbots which do not send emails should not be subject to the >> revert-to-green policy. To be honest, it's actually not even clear from >> reading the above passage where the burden of fixing a "broken" patch on a >> silent buildbot lies at all - with the patch author or with the bot >> maintainer. >> >> >> Would anyone care to weigh in with an unbiased opinion here? >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190220/e9a5861d/attachment.html>
via llvm-dev
2019-Feb-20 15:39 UTC
[llvm-dev] Clarification on expectations of buildbot email notifications
Reid said:> I don't think whether a buildbot sends email should have anything to do > with whether we revert to green or not. Very often, developers commit > patches that cause regressions not caught by our buildbots. If the > regression is severe enough, then I think community members have the > right, and perhaps responsibility, to revert the change that caused it. > Our team maintains bots that build chrome with trunk versions of clang, > and we identify many regressions this way and end up doing many reverts > as a result. I think it's important to continue this practice so that > we don't let multiple regressions pile up.My team also has internal bots and we see breakages way more often than we'd like. We are a bit reluctant to just go revert something, though, and typically try to engage the patch author first. Engaging the author has a couple of up-sides: it respects the author's contribution and attention to the process; and once you've had to fix a particular problem yourself (rather than someone else cleaning up after your mess) you are less likely to repeat that mistake.> I think what's important, and what we should, after this discussion > concludes, put in the developer policy, is that the person doing the > revert has the responsibility to do their best to help the patch author > reproduce the problem or at least understand the bug. > > This can take many forms. They can link directly to an LLVM buildbot, > which should be self-explanatory as far as reproduction goes. It can be > an unreduced crash report. If they're nice, they can use CReduce to make > it smaller. But, a reverter can't just say "Revert rNNN, breaks > $RANDOM_PROJECT on x86_64-linux-gu". If they add, "reduction forthcoming" > and they deliver on that promise, I think we should support that. > > In other words, the bar to revert should be low, so we can do it fast > and save downstream consumers time and effort. If someone isn't making > a good faith effort to follow up after a revert, then authors have a > right to push back.We have been on the wrong side of a revert where it was "this broke us" and then nothing. I was inclined to just re-apply the patch, but that's my "Mr Grumpy" avatar talking. How do we address failure to conform to the community norms?> I agree with Paul that we should remove the text about checking nightly > builders. That suggestion seems a bit dated.That was Tom Stellard, not me, but I agree with him. --paulr
Reasonably Related Threads
- Clarification on expectations of buildbot email notifications
- Clarification on expectations of buildbot email notifications
- Clarification on expectations of buildbot email notifications
- Clarification on expectations of buildbot email notifications
- buildbot failure in LLVM on clang-cmake-thumbv7-a15-full-sh