thr3ads.net - llvm dev - [llvm-dev] Clarification on expectations of buildbot email notifications [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Zachary Turner via llvm-dev

2019-Feb-19 19:21 UTC

[llvm-dev] Clarification on expectations of buildbot email notifications

Hi all,

Over the past year or so, all of us have broken the buildbots on many
occasions.  Usually we get notified on IRC, or via an buildbot email
notification sent to everyone on the blamelist.
If I happen to be on IRC I'll see the notification, but if not, the next
best thing is an email that was automatically sent to me (along with
everyone else on the blamelist) from the buildbot with information about
the failure.
And then finally, I'll occasionally get a response to my commit message
telling me that it's broken, and the patch may be reverted with information
in the commit message explaining which bot was broken and providing a link
to it.

However, we have some buildbots on the public waterfall which are
specifically configured not to send emails to people.  In some cases it's
because the bots are experimental, but there are a handful where the
reasoning I've been given is that it "wastes peoples time and
contributes
to build blindness", but we are still expected to keep them green (usually
by people manually reaching out to us when they fail, or patches getting
reverted and us getting notified of the revert).

It is this last case that I'm concerned about, as it appears to be in
direct conflict with our own developer policy [
https://llvm.org/docs/DeveloperPolicy.html#id14], which states this
-----
We prefer for this to be handled before submission but understand that it
isn’t possible to test all of this for every submission. Our build bots and
nightly testing infrastructure normally finds these problems. A good rule
of thumb is to check the nightly testers for regressions the day after your
change. Build bots will directly email you if a group of commits that
included yours caused a failure. You are expected to check the build bot
messages to see if they are your fault and, if so, fix the breakage.

Commits that violate these quality standards (e.g. are very broken) may be
reverted. This is necessary when the change blocks other developers from
making progress. The developer is welcome to re-commit the change after the
problem has been fixed.

-----

I'm sending this email to get a sense of the community's views on this
matter.  If I'm correctly reading between the lines in the above passage,
buildbots which do not send emails should not be subject to the
revert-to-green policy.  To be honest, it's actually not even clear from
reading the above passage where the burden of fixing a "broken" patch
on a
silent buildbot lies at all - with the patch author or with the bot
maintainer.


Would anyone care to weigh in with an unbiased opinion here?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190219/9a44ce36/attachment.html>

Tom Stellard via llvm-dev

2019-Feb-19 20:02 UTC

head link

[llvm-dev] Clarification on expectations of buildbot email notifications

On 02/19/2019 11:21 AM, Zachary Turner via llvm-dev
wrote:> Hi all,
> 
> Over the past year or so, all of us have broken the buildbots on many
occasions.  Usually we get notified on IRC, or via an buildbot email
notification sent to everyone on the blamelist.
> If I happen to be on IRC I'll see the notification, but if not, the
next best thing is an email that was automatically sent to me (along with
everyone else on the blamelist) from the buildbot with information about the
failure.
> And then finally, I'll occasionally get a response to my commit message
telling me that it's broken, and the patch may be reverted with information
in the commit message explaining which bot was broken and providing a link to
it.
> 
> However, we have some buildbots on the public waterfall which are
specifically configured not to send emails to people.  In some cases it's
because the bots are experimental, but there are a handful where the reasoning
I've been given is that it "wastes peoples time and contributes to
build blindness", but we are still expected to keep them green (usually by
people manually reaching out to us when they fail, or patches getting reverted
and us getting notified of the revert).
> 
> It is this last case that I'm concerned about, as it appears to be in
direct conflict with our own developer policy
[https://llvm.org/docs/DeveloperPolicy.html#id14], which states this
> -----
> We prefer for this to be handled before submission but understand that it
isn’t possible to test all of this for every submission. Our build bots and
nightly testing infrastructure normally finds these problems. A good rule of
thumb is to check the nightly testers for regressions the day after your change.
Build bots will directly email you if a group of commits that included yours
caused a failure. You are expected to check the build bot messages to see if
they are your fault and, if so, fix the breakage.
> 
> Commits that violate these quality standards (e.g. are very broken) may be
reverted. This is necessary when the change blocks other developers from making
progress. The developer is welcome to re-commit the change after the problem has
been fixed.
> 
> -----  
> 
> I'm sending this email to get a sense of the community's views on
this matter.  If I'm correctly reading between the lines in the above
passage, buildbots which do not send emails should not be subject to the
revert-to-green policy.  To be honest, it's actually not even clear from
reading the above passage where the burden of fixing a "broken" patch
on a silent buildbot lies at all - with the patch author or with the bot
maintainer.
> 
> 
> Would anyone care to weigh in with an unbiased opinion here?
> 
I think for any regressions whether they affect buildbots or not, the
patch author should be responsible for fixing the issue.  In my experience,
this is also usually what happens when I report a regression.

I think the responsibility of the regression reporter (which could be
a buildbot) is to provide a reasonably prompt notification of failure along
with clear instructions for reproducing the issue.  If that criteria is met,
then I think it is always fair to ask for a revert if the issue can't be
fixed
in a few days.

Even without a prompt notification, though, I still think reverts are
appropriate in
some cases and that the patch author should take the lead in fixing the issue.

The buildbots that automatically send notifications,though, are a little
bit of a special case, because when they are broken it affects everybody.
I think for those bots, having an immediate revert to green policy, like
we do now is fine.

I don't think it's reasonable to expect developers to monitor nightly
testers,
so maybe this part of the developer policy should be changed.  There is
almost always at least one one bot that is red.

-Tom
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Reid Kleckner via llvm-dev

2019-Feb-19 23:29 UTC

head link

[llvm-dev] Clarification on expectations of buildbot email notifications

I don't think whether a buildbot sends email should have anything to do
with whether we revert to green or not. Very often, developers commit
patches that cause regressions not caught by our buildbots. If the
regression is severe enough, then I think community members have the right,
and perhaps responsibility, to revert the change that caused it. Our team
maintains bots that build chrome with trunk versions of clang, and we
identify many regressions this way and end up doing many reverts as a
result. I think it's important to continue this practice so that we
don't
let multiple regressions pile up.

I think what's important, and what we should, after this discussion
concludes, put in the developer policy, is that the person doing the revert
has the responsibility to do their best to help the patch author reproduce
the problem or at least understand the bug.

This can take many forms. They can link directly to an LLVM buildbot, which
should be self-explanatory as far as reproduction goes. It can be an
unreduced crash report. If they're nice, they can use CReduce to make it
smaller. But, a reverter can't just say "Revert rNNN, breaks
$RANDOM_PROJECT on x86_64-linux-gu". If they add, "reduction
forthcoming"
and they deliver on that promise, I think we should support that.

In other words, the bar to revert should be low, so we can do it fast and
save downstream consumers time and effort. If someone isn't making a good
faith effort to follow up after a revert, then authors have a right to push
back.

I agree with Paul that we should remove the text about checking nightly
builders. That suggestion seems a bit dated.

On Tue, Feb 19, 2019 at 11:22 AM Zachary Turner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> Over the past year or so, all of us have broken the buildbots on many
> occasions.  Usually we get notified on IRC, or via an buildbot email
> notification sent to everyone on the blamelist.
> If I happen to be on IRC I'll see the notification, but if not, the
next
> best thing is an email that was automatically sent to me (along with
> everyone else on the blamelist) from the buildbot with information about
> the failure.
> And then finally, I'll occasionally get a response to my commit message
> telling me that it's broken, and the patch may be reverted with
information
> in the commit message explaining which bot was broken and providing a link
> to it.
>
> However, we have some buildbots on the public waterfall which are
> specifically configured not to send emails to people.  In some cases
it's
> because the bots are experimental, but there are a handful where the
> reasoning I've been given is that it "wastes peoples time and
contributes
> to build blindness", but we are still expected to keep them green
(usually
> by people manually reaching out to us when they fail, or patches getting
> reverted and us getting notified of the revert).
>
> It is this last case that I'm concerned about, as it appears to be in
> direct conflict with our own developer policy [
> https://llvm.org/docs/DeveloperPolicy.html#id14], which states this
> -----
> We prefer for this to be handled before submission but understand that it
> isn’t possible to test all of this for every submission. Our build bots and
> nightly testing infrastructure normally finds these problems. A good rule
> of thumb is to check the nightly testers for regressions the day after your
> change. Build bots will directly email you if a group of commits that
> included yours caused a failure. You are expected to check the build bot
> messages to see if they are your fault and, if so, fix the breakage.
>
> Commits that violate these quality standards (e.g. are very broken) may be
> reverted. This is necessary when the change blocks other developers from
> making progress. The developer is welcome to re-commit the change after the
> problem has been fixed.
>
> -----
>
> I'm sending this email to get a sense of the community's views on
this
> matter.  If I'm correctly reading between the lines in the above
passage,
> buildbots which do not send emails should not be subject to the
> revert-to-green policy.  To be honest, it's actually not even clear
from
> reading the above passage where the burden of fixing a "broken"
patch on a
> silent buildbot lies at all - with the patch author or with the bot
> maintainer.
>
>
> Would anyone care to weigh in with an unbiased opinion here?
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190219/3f1299bd/attachment.html>

Sjoerd Meijer via llvm-dev

2019-Feb-20 09:32 UTC

head link

[llvm-dev] Clarification on expectations of buildbot email notifications

I think we could/should be a little bit more precise here:
> ... any regressions whether they affect buildbots or not, the
> patch author should be responsible for fixing the issue.
especially if we say that the bar for a revert is low. That is, the "any
regression" needs a bit more clarifications. Assuming we are talking about
performance regressions (not language conformance or correctness):

1) We sometimes see regressions where code generation is (almost) the same, but
the code layout is different. Some micro-architectures are more sensitive to
this than others, causing significant regressions. We always thought it was
unfair to ask for a revert for these kind of regressions, and thus never ask for
that.

2) We also sometimes see that patches that cause regressions actually do the
right thing, but have all sorts of knock on effects e.g. causing different
codegen and regressions. Sometimes this is just unlucky (e.g. regalloc making
different decisions), but sometimes other passes can't handle the IR or
machine code less efficient and something need to be actually fixed. But we also
very rarely ask for a revert in these cases.

3) The obvious and straightforward case is when a patch is not doing the right
thing or e.g. forgets certain cases. Usually what we do is leave a comment on
the Phab ticket, and when the author responds fast and works on a fix we can
live with the regression for a few days (but it looks like we could be a bit
more aggressive with reverts if we wanted to).

The straightforward cases are 1) and 3), where the former is not worth a revert
(but it would be good to be explicit about this), and 3) is definitely worth a
revert.

2) is the tricky one, because it has a lot of grey areas. I guess the reason why
we are not very aggressive with reverts is that we don't want to stop others
from making progress, and also thought that in some cases it was just our
problem and not the author's. In the example of knock on effects and some
heuristic making a different/wrong decision, I thought it was unfair to the
author to ask for a revert. A more aggressive revert policy here could easily
lead to people not making any progress or a lot less fast.

Cheers,
Sjoerd.

________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Reid
Kleckner via llvm-dev <llvm-dev at lists.llvm.org>
Sent: 19 February 2019 23:29
To: Zachary Turner
Cc: llvm-dev
Subject: Re: [llvm-dev] Clarification on expectations of buildbot email
notifications

I don't think whether a buildbot sends email should have anything to do with
whether we revert to green or not. Very often, developers commit patches that
cause regressions not caught by our buildbots. If the regression is severe
enough, then I think community members have the right, and perhaps
responsibility, to revert the change that caused it. Our team maintains bots
that build chrome with trunk versions of clang, and we identify many regressions
this way and end up doing many reverts as a result. I think it's important
to continue this practice so that we don't let multiple regressions pile up.

I think what's important, and what we should, after this discussion
concludes, put in the developer policy, is that the person doing the revert has
the responsibility to do their best to help the patch author reproduce the
problem or at least understand the bug.

This can take many forms. They can link directly to an LLVM buildbot, which
should be self-explanatory as far as reproduction goes. It can be an unreduced
crash report. If they're nice, they can use CReduce to make it smaller. But,
a reverter can't just say "Revert rNNN, breaks $RANDOM_PROJECT on
x86_64-linux-gu". If they add, "reduction forthcoming" and they
deliver on that promise, I think we should support that.

In other words, the bar to revert should be low, so we can do it fast and save
downstream consumers time and effort. If someone isn't making a good faith
effort to follow up after a revert, then authors have a right to push back.

I agree with Paul that we should remove the text about checking nightly
builders. That suggestion seems a bit dated.

On Tue, Feb 19, 2019 at 11:22 AM Zachary Turner via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi all,

Over the past year or so, all of us have broken the buildbots on many occasions.
Usually we get notified on IRC, or via an buildbot email notification sent to
everyone on the blamelist.
If I happen to be on IRC I'll see the notification, but if not, the next
best thing is an email that was automatically sent to me (along with everyone
else on the blamelist) from the buildbot with information about the failure.
And then finally, I'll occasionally get a response to my commit message
telling me that it's broken, and the patch may be reverted with information
in the commit message explaining which bot was broken and providing a link to
it.

However, we have some buildbots on the public waterfall which are specifically
configured not to send emails to people.  In some cases it's because the
bots are experimental, but there are a handful where the reasoning I've been
given is that it "wastes peoples time and contributes to build
blindness", but we are still expected to keep them green (usually by people
manually reaching out to us when they fail, or patches getting reverted and us
getting notified of the revert).

It is this last case that I'm concerned about, as it appears to be in direct
conflict with our own developer policy
[https://llvm.org/docs/DeveloperPolicy.html#id14], which states this
-----
We prefer for this to be handled before submission but understand that it isn’t
possible to test all of this for every submission. Our build bots and nightly
testing infrastructure normally finds these problems. A good rule of thumb is to
check the nightly testers for regressions the day after your change. Build bots
will directly email you if a group of commits that included yours caused a
failure. You are expected to check the build bot messages to see if they are
your fault and, if so, fix the breakage.

Commits that violate these quality standards (e.g. are very broken) may be
reverted. This is necessary when the change blocks other developers from making
progress. The developer is welcome to re-commit the change after the problem has
been fixed.

-----

I'm sending this email to get a sense of the community's views on this
matter.  If I'm correctly reading between the lines in the above passage,
buildbots which do not send emails should not be subject to the revert-to-green
policy.  To be honest, it's actually not even clear from reading the above
passage where the burden of fixing a "broken" patch on a silent
buildbot lies at all - with the patch author or with the bot maintainer.

Would anyone care to weigh in with an unbiased opinion here?

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190220/0dc0d962/attachment.html>

Chandler Carruth via llvm-dev

2019-Feb-20 10:11 UTC

head link

[llvm-dev] Clarification on expectations of buildbot email notifications

On Tue, Feb 19, 2019 at 1:29 PM Reid Kleckner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I don't think whether a buildbot sends email should have anything to do
> with whether we revert to green or not. Very often, developers commit
> patches that cause regressions not caught by our buildbots. If the
> regression is severe enough, then I think community members have the right,
> and perhaps responsibility, to revert the change that caused it. Our team
> maintains bots that build chrome with trunk versions of clang, and we
> identify many regressions this way and end up doing many reverts as a
> result. I think it's important to continue this practice so that we
don't
> let multiple regressions pile up.
>
> I think what's important, and what we should, after this discussion
> concludes, put in the developer policy, is that the person doing the revert
> has the responsibility to do their best to help the patch author reproduce
> the problem or at least understand the bug.
>
> This can take many forms. They can link directly to an LLVM buildbot,
> which should be self-explanatory as far as reproduction goes. It can be an
> unreduced crash report. If they're nice, they can use CReduce to make
it
> smaller. But, a reverter can't just say "Revert rNNN, breaks
> $RANDOM_PROJECT on x86_64-linux-gu". If they add, "reduction
forthcoming"
> and they deliver on that promise, I think we should support that.
>
> In other words, the bar to revert should be low, so we can do it fast and
> save downstream consumers time and effort. If someone isn't making a
good
> faith effort to follow up after a revert, then authors have a right to push
> back.
>
I really strongly endorse this approach. This, IMO, is the crux of
revert-to-green: somewhat regardless of the source of green vs. red, we
need to revert quickly and with relatively low bar. The result of a revert
is a shared obligation between reverter and author to find a path forward
and Reid nicely outlines how the reverter can address their end of the
bargain.

I want to emphasize that "quickly" here often (but definitely not
always)
needs to be much shorter than "a few days" or even "a day"
due to the rate
of incoming patches and the need to minimize compound failures hiding
precise regression signal.

Anyways, +1 =]
-Chandler

>
> I agree with Paul that we should remove the text about checking nightly
> builders. That suggestion seems a bit dated.
>
> On Tue, Feb 19, 2019 at 11:22 AM Zachary Turner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> Over the past year or so, all of us have broken the buildbots on many
>> occasions.  Usually we get notified on IRC, or via an buildbot email
>> notification sent to everyone on the blamelist.
>> If I happen to be on IRC I'll see the notification, but if not, the
next
>> best thing is an email that was automatically sent to me (along with
>> everyone else on the blamelist) from the buildbot with information
about
>> the failure.
>> And then finally, I'll occasionally get a response to my commit
message
>> telling me that it's broken, and the patch may be reverted with
information
>> in the commit message explaining which bot was broken and providing a
link
>> to it.
>>
>> However, we have some buildbots on the public waterfall which are
>> specifically configured not to send emails to people.  In some cases
it's
>> because the bots are experimental, but there are a handful where the
>> reasoning I've been given is that it "wastes peoples time and
contributes
>> to build blindness", but we are still expected to keep them green
(usually
>> by people manually reaching out to us when they fail, or patches
getting
>> reverted and us getting notified of the revert).
>>
>> It is this last case that I'm concerned about, as it appears to be
in
>> direct conflict with our own developer policy [
>> https://llvm.org/docs/DeveloperPolicy.html#id14], which states this
>> -----
>> We prefer for this to be handled before submission but understand that
it
>> isn’t possible to test all of this for every submission. Our build bots
and
>> nightly testing infrastructure normally finds these problems. A good
rule
>> of thumb is to check the nightly testers for regressions the day after
your
>> change. Build bots will directly email you if a group of commits that
>> included yours caused a failure. You are expected to check the build
bot
>> messages to see if they are your fault and, if so, fix the breakage.
>>
>> Commits that violate these quality standards (e.g. are very broken) may
>> be reverted. This is necessary when the change blocks other developers
from
>> making progress. The developer is welcome to re-commit the change after
the
>> problem has been fixed.
>>
>> -----
>>
>> I'm sending this email to get a sense of the community's views
on this
>> matter.  If I'm correctly reading between the lines in the above
passage,
>> buildbots which do not send emails should not be subject to the
>> revert-to-green policy.  To be honest, it's actually not even clear
from
>> reading the above passage where the burden of fixing a
"broken" patch on a
>> silent buildbot lies at all - with the patch author or with the bot
>> maintainer.
>>
>>
>> Would anyone care to weigh in with an unbiased opinion here?
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190220/e9a5861d/attachment.html>

via llvm-dev

2019-Feb-20 15:39 UTC

head link

[llvm-dev] Clarification on expectations of buildbot email notifications

Reid said:
> I don't think whether a buildbot sends email should have anything to do
> with whether we revert to green or not. Very often, developers commit
> patches that cause regressions not caught by our buildbots. If the
> regression is severe enough, then I think community members have the
> right, and perhaps responsibility, to revert the change that caused it.
> Our team maintains bots that build chrome with trunk versions of clang,
> and we identify many regressions this way and end up doing many reverts
> as a result. I think it's important to continue this practice so that
> we don't let multiple regressions pile up.
My team also has internal bots and we see breakages way more often than
we'd like.  We are a bit reluctant to just go revert something, though,
and typically try to engage the patch author first.

Engaging the author has a couple of up-sides: it respects the author's
contribution and attention to the process; and once you've had to fix
a particular problem yourself (rather than someone else cleaning up
after your mess) you are less likely to repeat that mistake.
> I think what's important, and what we should, after this discussion
> concludes, put in the developer policy, is that the person doing the
> revert has the responsibility to do their best to help the patch author
> reproduce the problem or at least understand the bug.
>
> This can take many forms. They can link directly to an LLVM buildbot,
> which should be self-explanatory as far as reproduction goes. It can be
> an unreduced crash report. If they're nice, they can use CReduce to
make
> it smaller. But, a reverter can't just say "Revert rNNN, breaks
> $RANDOM_PROJECT on x86_64-linux-gu". If they add, "reduction
forthcoming"
> and they deliver on that promise, I think we should support that.
>
> In other words, the bar to revert should be low, so we can do it fast
> and save downstream consumers time and effort. If someone isn't making
> a good faith effort to follow up after a revert, then authors have a
> right to push back.
We have been on the wrong side of a revert where it was "this broke
us"
and then nothing. I was inclined to just re-apply the patch, but that's
my "Mr Grumpy" avatar talking. How do we address failure to conform to
the
community norms?
> I agree with Paul that we should remove the text about checking nightly
> builders. That suggestion seems a bit dated.
That was Tom Stellard, not me, but I agree with him.
--paulr

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Feb 2019 - Clarification on expectations of buildbot email notifications

[llvm-dev] Clarification on expectations of buildbot email notifications

[llvm-dev] Clarification on expectations of buildbot email notifications

[llvm-dev] Clarification on expectations of buildbot email notifications

[llvm-dev] Clarification on expectations of buildbot email notifications

[llvm-dev] Clarification on expectations of buildbot email notifications

[llvm-dev] Clarification on expectations of buildbot email notifications

Reasonably Related Threads