thr3ads.net - llvm dev - [llvm-dev] Responsibilities of a buildbot owner [Jan 2022]

If this information is useful, please help other people find it:
Share via:

Mehdi AMINI via llvm-dev

2022-Jan-13 05:18 UTC

[llvm-dev] Responsibilities of a buildbot owner

On Wed, Jan 12, 2022 at 7:33 PM Galina Kistanova <gkistanova at gmail.com>
wrote:
> Hello everyone,
>
> In continuation of the Responsibilities of a buildbot owner thread.
>
> First of all, thank you very much for being buildbot owners! This is much
> appreciated.
> Thank you for bringing good points to the discussion.
>
> It is expected that buildbot owners own bots which are reliable,
> informative and helpful to the community.
>
> Effectively that means if a problem is detected by a builder and it is
> hard to pinpoint the reason of the issue and a commit to blame, a buildbot
> owner is natively on the escalation path. Someone has to get to the root of
> the problem and fix it one way or another (by reverting the commit, or by
> proposing a patch, or by working with the author of the commit which
> introduced the issue). In the majority of the cases someone takes care of
> an issue. But sometimes it takes a buildbot owner to push. Every buildbot
> owner does this from time to time.
>
> Hi Mehdi,
>
> > Something quite annoying with staging is that it does not have (as far
> as I know) a way
> > to continue to notify the buildbot owner.
>
> You mentioned this recently in one of the reviews. With
>
https://github.com/llvm/llvm-zorg/commit/3c5b8f5bbc37076036997b3dd8b0137252bcb826
> in place, you can add the tag "silent" to your production
builder, and it
> will not send notifications to the blame list. You can set the exact
> notifications you want in the master/config/status.py for that builder.
> Hope this helps you.
>
Fantastic! I'll use this for the next steps for my bots (when I get back to
it, I slacked on this recently...) :)

We may also use this on flaky bots in the future?

Thanks,

-- 
Mehdi

I do not want to have the staging even able to send emails. We debug
and> test many things there, including notifications, and there is always a risk
> of spam.
>
> Thanks
>
> Galina
>
> On Sun, Jan 9, 2022 at 6:07 PM David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> +1 to most of what Mehdi's said here - I'd love to see
improvements in
>> stability, though probably having some rigid delegation of
responsibility
>> (rather than relying on developers to judge whether it's a flaky
test or
>> flaky bot - that isn't always obvious, maybe it's only flaky on
a
>> particular configuration that that buildbot happens to test and the
>> developer doesn't have access to - then which is it?) might help
(eg: if
>> it's at all unclear, then the assumption is that it's always
the test or
>> always the buildbot owner - and an expectation that the author or owner
>> then takes responsibility for working with the other party to address
the
>> issue, etc).
>>
>> That all said, disabling individual tests may risk no one caring enough
>> to re-enable them, especially when the flakiness is found long after
the
>> change is made that introduced the test or flakiness (usually the case
with
>> flakiness - it takes a while to become apparent) - I don't really
know how
>> to address that issue. The "convenience" with disabling a
buildbot is that
>> there's other value to the buildbot (other than the flaky test that
was
>> providing negative value), so buildbot owners have more motivation to
get
>> the bot back online - though I don't want to burden buildbot owners
unduly
>> either (because they'd eventually give up on them) :/
>>
>> - Dave
>>
>> On Sat, Jan 8, 2022 at 5:15 PM Mehdi AMINI via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi,
>>>
>>> First: thanks a lot Stella for being a bot owner and providing
valuable
>>> resources to the community. The sequence of even is really
unfortunate
>>> here, and thank you for bringing it up to everyone's attention,
let's try
>>> to improve our processes.
>>>
>>> On Sat, Jan 8, 2022 at 1:01 PM Philip Reames via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Stella,
>>>>
>>>> Thank you for raising the question.  This is a great discussion
for us
>>>> to have publicly.
>>>>
>>>> So folks know, I am the individual Stella mentioned below. 
I'll start
>>>> with a bit of history so that everyone's on the same page,
then dive into
>>>> the policy question.
>>>>
>>>> My general take is that buildbots are only useful if failure
>>>> notifications are generally actionable.  A couple months back,
I was on the
>>>> edge of setting up mail filter rules to auto-delete a bunch of
bots because
>>>> they were regularly broken, and decided I should try to be
constructive
>>>> first.  In the first wave of that, I emailed a couple of bot
owners about
>>>> things which seemed like false positives.
>>>>
>>>> At the time, I thought it was the bot owners responsibility to
not be
>>>> testing a flaky configuration.  I got a bit of push back on
that from a
>>>> couple sources - Stella was one - and put that question on
hold.  This
>>>> thread is a great opportunity to decide what our policy
actually is, and
>>>> document it.
>>>>
>>>> In the meantime, I've been working with Galina to document
existing
>>>> practice where we could, and to try to identify best practices
on setting
>>>> up bots.  These changes have been posted publicly, and reviewed
through the
>>>> normal process.  We've been deliberately trying to stick to
>>>> non-controversial stuff as we got the docs improved.  I've
been actively
>>>> reaching out to bot owners to gather feedback in this process,
but Stella
>>>> had not, yet, been one.
>>>>
>>>> Separately, this week I noticed a bot which was repeatedly
toggling
>>>> between red and green.  I forget the exact ratio, but in the
recent build
>>>> history, there were multiple transitions, seemingly unrelated
to the
>>>> changes being committed.  I emailed Galina asking her to
address, and she
>>>> removed the buildbot until it could be moved to the staging
buildmaster,
>>>> addressed, and then restored.  I left Stella off the initial
email.  Sorry
>>>> about that, no ill intent, just written in a hurry.
>>>>
>>>> Now, transitioning into a bit of policy discussion...
>>>>
>>>> From my conversations with existing bot owners, there is a
general
>>>> agreement that bots should only be notifying the community if
they are
>>>> stable enough.  There's honest disagreement on what the bar
for stable
>>>> enough is, and disagreement about exactly whose responsibility
addressing
>>>> new instability is.  (To be clear, I'd separate instability
from a clear
>>>> deterministic breakage caused by a commit - we have a lot more
agreement on
>>>> that.)
>>>>
>>>> My personal take is that for a bot to be publicly notifying,
"someone"
>>>> needs to take the responsibility to backstop the normal revert
to green
>>>> process.  This "someone" can be developers who work
in a particular area,
>>>> the bot owner, or some combination thereof.  I view the
responsibility of
>>>> the bot config owner as being the person responsible for making
sure that
>>>> backstopping is happening.  Not necessarily by doing it
themselves, but by
>>>> having the contacts with developers who can, and following up
when the
>>>> normal flow is not working.
>>>>
>>>> In this particular example, we appear to have a bunch of flaky
lldb
>>>> tests.  I personally know absolutely nothing about lldb.  I
have no idea
>>>> whether the tests are badly designed, the system they're
being run on isn't
>>>> yet supported by lldb, or if there's some recent code bug
introduced which
>>>> causes the failure.  "Someone" needs to take the
responsibility of figuring
>>>> that out, and in the meantime spaming developers with
inactionable failure
>>>> notices seems undesirable.
>>>>
>>>
>>> I generally agree with the overall sentiment. I would add that
something
>>> worse differentiating is that the source of flakiness can be coming
from
>>> the bot itself (flaky hardware / fragile setup), or from the
test/codebase
>>> itself (a flaky bot may just be a deterministic ASAN failure).
>>> Of course from Philip's point of view it does not matter: the
effect on
>>> the developer is similar, we get undesirable and unactionable
>>> notifications. From the maintenance flow however, it matters in
that the
>>> "someone" who has to take responsibility is often not the
same group of
>>> folks.
>>> Also when encountering flaky tests, the best action may not be to
>>> disable the bot itself but instead to disable the test itself! (and
file a
>>> bug against the test owner...).
>>>
>>> One more dimension that seems to surface here may be different
practices
>>> or expectations across subprojects, for example here the LLDB folks
may be
>>> used to having some flaky tests, but they trigger on changes to
LLVM
>>> itself, where we may not expect any flakiness (or so).
>>>
>>>
>>>> For context, the bot was disabled until it could be moved to
the
>>>> staging buildmaster.  Moving to staging is required (currently)
to disable
>>>> developer notification.  In the email from Galina, it seems
clear that the
>>>> bot would be fine to move back to production once the issue was
triaged.
>>>> This seems entirely reasonable to me.
>>>>
>>>
>>> Something quite annoying with staging is that it does not have (as
far
>>> as I know) a way to continue to notify the buildbot owner. I
don't really
>>> care about staging vs prod as much as having a mode to just
"not notify the
>>> blame list" / "only notify the owner".
>>>
>>> --
>>> Mehdi
>>>
>>>
>>>
>>>> Philip
>>>>
>>>> p.s. One thing I'll note as a definite problem with the
current system
>>>> is that a lot of this happens in private email, and it's
hard to share so
>>>> that everyone has a good picture of what's going on.  It
makes
>>>> miscommunications all too easy.  Last time I spoke with Galina,
we were
>>>> tentative planning to start using github issues for bot
operation matters
>>>> to address that, but as that was in the middle of the
transition from
>>>> bugzilla, we deferred and haven't gotten back to that yet.
>>>>
>>>> p.p.s. The bot in question is
>>>> https://lab.llvm.org/buildbot/#/builders/83 if folks want to
examine
>>>> the history themselves.
>>>> On 1/8/22 12:06 PM, Stella Stamenova via llvm-dev wrote:
>>>>
>>>> Hey all,
>>>>
>>>>
>>>>
>>>> I have a couple of questions about what the responsibilities of
a
>>>> buildbot owner are. I’ve been maintaining a couple of buildbots
for lldb
>>>> and mlir for some time now and I thought I had a pretty good
idea of what
>>>> is required based on the documentation here: How To Add Your
Build
>>>> Configuration To LLVM Buildbot Infrastructure — LLVM 13
documentation
>>>> <https://www.llvm.org/docs/HowToAddABuilder.html>
>>>>
>>>>
>>>>
>>>> My understanding was that there are some things that are
**expected**
>>>> of the owner. Namely:
>>>>
>>>>    1. Make sure that the buildbot is connected and has the
right
>>>>    infrastructure (e.g. the right version of Python, or tools,
etc.). Update
>>>>    as needed.
>>>>    2. Make sure that the build configuration is one that is
supported
>>>>    (e.g. supported flavor or cmake variables). Update as
needed.
>>>>
>>>>
>>>>
>>>> There are also a couple of things that are **optional**, but
nice to
>>>> have:
>>>>
>>>>    1. If the buildbot stays red for a while (where “a while” is
>>>>    completely subjective), figure out the patch or patches that
are causing an
>>>>    issue and either revert them or notify the authors, so they
can take action.
>>>>    2. If someone is having trouble investigating a failure that
only
>>>>    happens on the buildbot (or the buildbot is a rare
configuration), help
>>>>    them out (e.g. collect logs if possible).
>>>>
>>>>
>>>>
>>>> Up to now, I’ve not had any issues with this and the community
has been
>>>> very good at fixing issues with builds and tests when I point
them out, or
>>>> more often than not, without me having to do anything but the
occasional
>>>> test re-run and software update (like this one, for example, ⚙
D114639
>>>> Raise the minimum Visual Studio version to VS2019 (llvm.org)
>>>> <https://reviews.llvm.org/D114639>). lldb has some tests
that are
>>>> flaky because of the nature of the product, so there is some
noise, but
>>>> mostly things work well and everyone seems happy.
>>>>
>>>>
>>>>
>>>> I’ve recently run into a situation that makes me wonder whether
there
>>>> are other expectations of a buildbot owner that are not
explicitly listed
>>>> in the llvm documentation. Someone reached out to me some time
ago to let
>>>> me know their unhappiness at the flakiness of some of the lldb
tests and
>>>> demanded that I either fix them or disable them. I let them
know that there
>>>> are some tests that are known to be flaky, that my expectation
is that it
>>>> is not my responsibility to fix all such issues and that the
community
>>>> would be very happy to have their contribution in the form of a
fix or a
>>>> change to disable the tests. I didn’t get a response from this
person, but
>>>> I did disable a couple of particularly flaky tests since it
seemed like the
>>>> nice thing to do.
>>>>
>>>>
>>>>
>>>> The real excitement happened yesterday when I received an email
that **the
>>>> build bot had been turned off**. This same person reached out
to the
>>>> powers that be (without letting me know) and asked them
explicitly to
>>>> silence it **without my active involvement** because of the
flakiness.
>>>>
>>>>
>>>>
>>>> I have a couple of issues with this approach but perhaps I’ve
>>>> misunderstood what my responsibilities are as the buildbot
owner. I know it
>>>> is frustrating to see a bot fail because of flaky tests and it
is nice to
>>>> have someone to ask to resolve them all – is that really the
expectation of
>>>> a buildbot owner? Where is the line between maintenance of the
bot and
>>>> fixing build and test issues for the community?
>>>>
>>>>
>>>>
>>>> I’d like to understand what the general expectations are and if
there
>>>> are things missing from the documentation, I propose that we
add them, so
>>>> that it is clear for everyone what is required.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> -Stella
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220112/a15f0c55/attachment-0001.html>

Galina Kistanova via llvm-dev

2022-Jan-13 07:24 UTC

head link

[llvm-dev] Responsibilities of a buildbot owner

> We may also use this on flaky bots in the future?
Yes, we may.
Or we may try to do our best to fix them. :)

Moving workers to the staging temporarily to investigate and address an
issue is fine. Gives a bit more elbow room for experimenting, as we can
apply experimental patches there, restart the staging as needed and often,
and so on. Which is not the case with the production. It does not take much
effort to move a worker between the staging and the production areas - a
simple edit of the buildbot.tac file and a worker restart.

Tagging a builder "silent" means there is a designated person or a
team who
is actively fixing the detected issues or acting as a proxy to handle the
blame list. This could be a way to dial with flaky bots, indeed, assuming
there is somebody taking care of those builders, not just a way to skip the
annoyance and keep the status quo.

By the way, thanks everyone for the constructive and polite discussion! It
seems we are going to have a more stable and informative Windows LLDB
builder.

Galina


On Wed, Jan 12, 2022 at 9:19 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>
> On Wed, Jan 12, 2022 at 7:33 PM Galina Kistanova <gkistanova at
gmail.com>
> wrote:
>
>> Hello everyone,
>>
>> In continuation of the Responsibilities of a buildbot owner thread.
>>
>> First of all, thank you very much for being buildbot owners! This is
much
>> appreciated.
>> Thank you for bringing good points to the discussion.
>>
>> It is expected that buildbot owners own bots which are reliable,
>> informative and helpful to the community.
>>
>> Effectively that means if a problem is detected by a builder and it is
>> hard to pinpoint the reason of the issue and a commit to blame, a
buildbot
>> owner is natively on the escalation path. Someone has to get to the
root of
>> the problem and fix it one way or another (by reverting the commit, or
by
>> proposing a patch, or by working with the author of the commit which
>> introduced the issue). In the majority of the cases someone takes care
of
>> an issue. But sometimes it takes a buildbot owner to push. Every
buildbot
>> owner does this from time to time.
>>
>> Hi Mehdi,
>>
>> > Something quite annoying with staging is that it does not have (as
far
>> as I know) a way
>> > to continue to notify the buildbot owner.
>>
>> You mentioned this recently in one of the reviews. With
>>
https://github.com/llvm/llvm-zorg/commit/3c5b8f5bbc37076036997b3dd8b0137252bcb826
>> in place, you can add the tag "silent" to your production
builder, and it
>> will not send notifications to the blame list. You can set the exact
>> notifications you want in the master/config/status.py for that builder.
>> Hope this helps you.
>>
>
> Fantastic! I'll use this for the next steps for my bots (when I get
back
> to it, I slacked on this recently...) :)
>
> We may also use this on flaky bots in the future?
>
> Thanks,
>
> --
> Mehdi
>
> I do not want to have the staging even able to send emails. We debug and
>> test many things there, including notifications, and there is always a
risk
>> of spam.
>>
>> Thanks
>>
>> Galina
>>
>> On Sun, Jan 9, 2022 at 6:07 PM David Blaikie via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> +1 to most of what Mehdi's said here - I'd love to see
improvements in
>>> stability, though probably having some rigid delegation of
responsibility
>>> (rather than relying on developers to judge whether it's a
flaky test or
>>> flaky bot - that isn't always obvious, maybe it's only
flaky on a
>>> particular configuration that that buildbot happens to test and the
>>> developer doesn't have access to - then which is it?) might
help (eg: if
>>> it's at all unclear, then the assumption is that it's
always the test or
>>> always the buildbot owner - and an expectation that the author or
owner
>>> then takes responsibility for working with the other party to
address the
>>> issue, etc).
>>>
>>> That all said, disabling individual tests may risk no one caring
enough
>>> to re-enable them, especially when the flakiness is found long
after the
>>> change is made that introduced the test or flakiness (usually the
case with
>>> flakiness - it takes a while to become apparent) - I don't
really know how
>>> to address that issue. The "convenience" with disabling a
buildbot is that
>>> there's other value to the buildbot (other than the flaky test
that was
>>> providing negative value), so buildbot owners have more motivation
to get
>>> the bot back online - though I don't want to burden buildbot
owners unduly
>>> either (because they'd eventually give up on them) :/
>>>
>>> - Dave
>>>
>>> On Sat, Jan 8, 2022 at 5:15 PM Mehdi AMINI via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> First: thanks a lot Stella for being a bot owner and providing
valuable
>>>> resources to the community. The sequence of even is really
unfortunate
>>>> here, and thank you for bringing it up to everyone's
attention, let's try
>>>> to improve our processes.
>>>>
>>>> On Sat, Jan 8, 2022 at 1:01 PM Philip Reames via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Stella,
>>>>>
>>>>> Thank you for raising the question.  This is a great
discussion for us
>>>>> to have publicly.
>>>>>
>>>>> So folks know, I am the individual Stella mentioned below. 
I'll start
>>>>> with a bit of history so that everyone's on the same
page, then dive into
>>>>> the policy question.
>>>>>
>>>>> My general take is that buildbots are only useful if
failure
>>>>> notifications are generally actionable.  A couple months
back, I was on the
>>>>> edge of setting up mail filter rules to auto-delete a bunch
of bots because
>>>>> they were regularly broken, and decided I should try to be
constructive
>>>>> first.  In the first wave of that, I emailed a couple of
bot owners about
>>>>> things which seemed like false positives.
>>>>>
>>>>> At the time, I thought it was the bot owners responsibility
to not be
>>>>> testing a flaky configuration.  I got a bit of push back on
that from a
>>>>> couple sources - Stella was one - and put that question on
hold.  This
>>>>> thread is a great opportunity to decide what our policy
actually is, and
>>>>> document it.
>>>>>
>>>>> In the meantime, I've been working with Galina to
document existing
>>>>> practice where we could, and to try to identify best
practices on setting
>>>>> up bots.  These changes have been posted publicly, and
reviewed through the
>>>>> normal process.  We've been deliberately trying to
stick to
>>>>> non-controversial stuff as we got the docs improved. 
I've been actively
>>>>> reaching out to bot owners to gather feedback in this
process, but Stella
>>>>> had not, yet, been one.
>>>>>
>>>>> Separately, this week I noticed a bot which was repeatedly
toggling
>>>>> between red and green.  I forget the exact ratio, but in
the recent build
>>>>> history, there were multiple transitions, seemingly
unrelated to the
>>>>> changes being committed.  I emailed Galina asking her to
address, and she
>>>>> removed the buildbot until it could be moved to the staging
buildmaster,
>>>>> addressed, and then restored.  I left Stella off the
initial email.  Sorry
>>>>> about that, no ill intent, just written in a hurry.
>>>>>
>>>>> Now, transitioning into a bit of policy discussion...
>>>>>
>>>>> From my conversations with existing bot owners, there is a
general
>>>>> agreement that bots should only be notifying the community
if they are
>>>>> stable enough.  There's honest disagreement on what the
bar for stable
>>>>> enough is, and disagreement about exactly whose
responsibility addressing
>>>>> new instability is.  (To be clear, I'd separate
instability from a clear
>>>>> deterministic breakage caused by a commit - we have a lot
more agreement on
>>>>> that.)
>>>>>
>>>>> My personal take is that for a bot to be publicly
notifying, "someone"
>>>>> needs to take the responsibility to backstop the normal
revert to green
>>>>> process.  This "someone" can be developers who
work in a particular area,
>>>>> the bot owner, or some combination thereof.  I view the
responsibility of
>>>>> the bot config owner as being the person responsible for
making sure that
>>>>> backstopping is happening.  Not necessarily by doing it
themselves, but by
>>>>> having the contacts with developers who can, and following
up when the
>>>>> normal flow is not working.
>>>>>
>>>>> In this particular example, we appear to have a bunch of
flaky lldb
>>>>> tests.  I personally know absolutely nothing about lldb.  I
have no idea
>>>>> whether the tests are badly designed, the system
they're being run on isn't
>>>>> yet supported by lldb, or if there's some recent code
bug introduced which
>>>>> causes the failure.  "Someone" needs to take the
responsibility of figuring
>>>>> that out, and in the meantime spaming developers with
inactionable failure
>>>>> notices seems undesirable.
>>>>>
>>>>
>>>> I generally agree with the overall sentiment. I would add that
>>>> something worse differentiating is that the source of flakiness
can be
>>>> coming from the bot itself (flaky hardware / fragile setup), or
from the
>>>> test/codebase itself (a flaky bot may just be a deterministic
ASAN failure).
>>>> Of course from Philip's point of view it does not matter:
the effect on
>>>> the developer is similar, we get undesirable and unactionable
>>>> notifications. From the maintenance flow however, it matters in
that the
>>>> "someone" who has to take responsibility is often not
the same group of
>>>> folks.
>>>> Also when encountering flaky tests, the best action may not be
to
>>>> disable the bot itself but instead to disable the test itself!
(and file a
>>>> bug against the test owner...).
>>>>
>>>> One more dimension that seems to surface here may be different
>>>> practices or expectations across subprojects, for example here
the LLDB
>>>> folks may be used to having some flaky tests, but they trigger
on changes
>>>> to LLVM itself, where we may not expect any flakiness (or so).
>>>>
>>>>
>>>>> For context, the bot was disabled until it could be moved
to the
>>>>> staging buildmaster.  Moving to staging is required
(currently) to disable
>>>>> developer notification.  In the email from Galina, it seems
clear that the
>>>>> bot would be fine to move back to production once the issue
was triaged.
>>>>> This seems entirely reasonable to me.
>>>>>
>>>>
>>>> Something quite annoying with staging is that it does not have
(as far
>>>> as I know) a way to continue to notify the buildbot owner. I
don't really
>>>> care about staging vs prod as much as having a mode to just
"not notify the
>>>> blame list" / "only notify the owner".
>>>>
>>>> --
>>>> Mehdi
>>>>
>>>>
>>>>
>>>>> Philip
>>>>>
>>>>> p.s. One thing I'll note as a definite problem with the
current system
>>>>> is that a lot of this happens in private email, and
it's hard to share so
>>>>> that everyone has a good picture of what's going on. 
It makes
>>>>> miscommunications all too easy.  Last time I spoke with
Galina, we were
>>>>> tentative planning to start using github issues for bot
operation matters
>>>>> to address that, but as that was in the middle of the
transition from
>>>>> bugzilla, we deferred and haven't gotten back to that
yet.
>>>>>
>>>>> p.p.s. The bot in question is
>>>>> https://lab.llvm.org/buildbot/#/builders/83 if folks want
to examine
>>>>> the history themselves.
>>>>> On 1/8/22 12:06 PM, Stella Stamenova via llvm-dev wrote:
>>>>>
>>>>> Hey all,
>>>>>
>>>>>
>>>>>
>>>>> I have a couple of questions about what the
responsibilities of a
>>>>> buildbot owner are. I’ve been maintaining a couple of
buildbots for lldb
>>>>> and mlir for some time now and I thought I had a pretty
good idea of what
>>>>> is required based on the documentation here: How To Add
Your Build
>>>>> Configuration To LLVM Buildbot Infrastructure — LLVM 13
documentation
>>>>> <https://www.llvm.org/docs/HowToAddABuilder.html>
>>>>>
>>>>>
>>>>>
>>>>> My understanding was that there are some things that are
**expected**
>>>>> of the owner. Namely:
>>>>>
>>>>>    1. Make sure that the buildbot is connected and has the
right
>>>>>    infrastructure (e.g. the right version of Python, or
tools, etc.). Update
>>>>>    as needed.
>>>>>    2. Make sure that the build configuration is one that is
supported
>>>>>    (e.g. supported flavor or cmake variables). Update as
needed.
>>>>>
>>>>>
>>>>>
>>>>> There are also a couple of things that are **optional**,
but nice to
>>>>> have:
>>>>>
>>>>>    1. If the buildbot stays red for a while (where “a
while” is
>>>>>    completely subjective), figure out the patch or patches
that are causing an
>>>>>    issue and either revert them or notify the authors, so
they can take action.
>>>>>    2. If someone is having trouble investigating a failure
that only
>>>>>    happens on the buildbot (or the buildbot is a rare
configuration), help
>>>>>    them out (e.g. collect logs if possible).
>>>>>
>>>>>
>>>>>
>>>>> Up to now, I’ve not had any issues with this and the
community has
>>>>> been very good at fixing issues with builds and tests when
I point them
>>>>> out, or more often than not, without me having to do
anything but the
>>>>> occasional test re-run and software update (like this one,
for example,
>>>>> ⚙ D114639 Raise the minimum Visual Studio version to VS2019
(llvm.org)
>>>>> <https://reviews.llvm.org/D114639>). lldb has some
tests that are
>>>>> flaky because of the nature of the product, so there is
some noise, but
>>>>> mostly things work well and everyone seems happy.
>>>>>
>>>>>
>>>>>
>>>>> I’ve recently run into a situation that makes me wonder
whether there
>>>>> are other expectations of a buildbot owner that are not
explicitly listed
>>>>> in the llvm documentation. Someone reached out to me some
time ago to let
>>>>> me know their unhappiness at the flakiness of some of the
lldb tests and
>>>>> demanded that I either fix them or disable them. I let them
know that there
>>>>> are some tests that are known to be flaky, that my
expectation is that it
>>>>> is not my responsibility to fix all such issues and that
the community
>>>>> would be very happy to have their contribution in the form
of a fix or a
>>>>> change to disable the tests. I didn’t get a response from
this person, but
>>>>> I did disable a couple of particularly flaky tests since it
seemed like the
>>>>> nice thing to do.
>>>>>
>>>>>
>>>>>
>>>>> The real excitement happened yesterday when I received an
email that **the
>>>>> build bot had been turned off**. This same person reached
out to the
>>>>> powers that be (without letting me know) and asked them
explicitly to
>>>>> silence it **without my active involvement** because of the
flakiness.
>>>>>
>>>>>
>>>>>
>>>>> I have a couple of issues with this approach but perhaps
I’ve
>>>>> misunderstood what my responsibilities are as the buildbot
owner. I know it
>>>>> is frustrating to see a bot fail because of flaky tests and
it is nice to
>>>>> have someone to ask to resolve them all – is that really
the expectation of
>>>>> a buildbot owner? Where is the line between maintenance of
the bot and
>>>>> fixing build and test issues for the community?
>>>>>
>>>>>
>>>>>
>>>>> I’d like to understand what the general expectations are
and if there
>>>>> are things missing from the documentation, I propose that
we add them, so
>>>>> that it is clear for everyone what is required.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Stella
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220112/73ee2889/attachment.html>

llvm dev - Jan 2022 - Responsibilities of a buildbot owner

[llvm-dev] Responsibilities of a buildbot owner

[llvm-dev] Responsibilities of a buildbot owner