thr3ads.net - llvm dev - [llvm-dev] Buildbot Noise [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Renato Golin via llvm-dev

2015-Oct-07 10:10 UTC

[llvm-dev] Buildbot Noise

Hi David,

I think we're repeating ourselves here, so I'll reduce to the bare
minimum before replying.

On 6 October 2015 at 21:40, David Blaikie <dblaikie at gmail.com>
wrote:> When I suggest someone disable notifications from a bot it's because
those
> notifications aren't actionable to those receiving them.
This is a very limited view of the utility of buildbots.

I think part of the problem is that you're expecting to get instant
value out of something that cannot provide that to you. If you can't
extract value from it, it's worthless.

Also, it seems, you're associating community buildbots with company
testing infrastructure. When I worked at big companies, there were
validation teams that would test my stuff and deal with *any* noise on
their own, and only the real signal would come to me: 100% actionable.
However, most of the bot owners in open source communities do this as
a secondary task. This has always been the case and until someone
(LLVM Foundation?) starts investing in a better infrastructure overall
(multi master, new slaves, admins), there isn't much we can do to
improve that quick enough.

The alternative is that the less common architectures will always have
noisier bots because less people use them day-to-day, during their
development time. By having a hard line on those, in the long run,
means we'll disable most testing on all secondary architectures, and
LLVM becomes an Intel compiler. But many companies use LLVM for their
production compiler on their own targets, so the inevitable is that
they will *fork* LLVM. I don't think anyone wants that.

> I'm not suggesting removing the testing. Merely placing the onus on
> responding to/investigating notifications on the parties with the context
to
> do so.
You still don't get the point. This would make sense on a world where
all parties are equal.

Most people develop and test on x86, even ARM and MIPS engineers. That
means x86 is almost always stable, no matter who's working.

But some bugs that we had to fix this year show up randomly *only* on
ARM. That was a serious misuse of the Itanium C++ ABI, and one that
took a long time to be fixed, and we still don't know if we got them
all.

Bugs like that normally only show up on self-hosting builds, sometimes
on self-hosted Clang compiled test-suite. These bugs have no hard
good/bad line for bisecting, they take hours per cycle, and they may
or may not fail, so automated bisecting won't work. Furthermore, there
is nothing to XFAIL in this case, unless you want to disable building
Clang, which I don't think you do.

While it's taking days, if not weeks, to investigate this bot, the
status may be going from red to green to red. It would be very
simplistic to assume that *any* greed->red transition while I'm
bisecting the problem will be due to the current known instability. It
could be anything, and developers still need to be warned if the alarm
goes off.

The result may be it's still flaky, the developer can't do much, life
goes on. Or it could be his test, he fixes immediately, and I'm
eternally grateful, because I still need to investigate *only one* bug
at a time. By silencing the bot, I'd have to be responsible for
debugging the original hard problem plus any other that would come
while the bot was flaky.

Now, there's the issue of where does the responsibility lies...

I'm responsible for the quality of the ARM code, including the
buildbots. What you're suggesting is that *no matter what* gets
committed, it is *my* responsibility to fix any bug that the original
developers can't *action upon*.

That might seem sensible at first, but the biggest problem here is the
term that you're using over and over again: *acting upon*. It can be a
technical limitation that you can't act upon a bug on an ARM bot, but
it can also be a personal one. I'm not saying *you* would do that, but
we have plenty of people in the community with plenty of their own
problems. You said it yourself, people tend to ignore problems that
they can't understand, but not understanding is *not* the same as not
being able to *act upon*.

For me, that attitude is what's at the core of the problem here. By
raising the bar faster than we can make it better, you're essentially
just giving people the right not to care. The bar will be raised even
further by peer pressure, and that's the kind of behaviour that leads
to a fork. I'm trying to avoid this at all costs.

> All I'd expect is that you/others watch the negative
> bot results, and forward any on that look like actionable true positives.
If
> that's too expensive, then I don't know how you can expect
community members
> to incur that cost instead of bot owners?
Another example of the assumption that bot owners are validation
engineers and that's their only job. It was never like this in LLVM
and it won't start today just because we want to.

My expectation of the LLVM Foundation is that they would take our
validation infrastructure to the next level, but so far I haven't seen
much happening. If you want to make it better, instead of forcing your
way on the existing scenario, why not work with the Foundation to move
this to the next level?

> Once people lose
> confidence in the bots, they're not likely to /gain/ confidence again -
That's not true. Galina's Panda bots were unstable in 2010, people
lost confidence, she added more boards, people re-gained confidence in
2011. Then it became unstable in 2013, people lost confidence, we
fixed the issues, people re-gained confidence only a few months later.
This year it got unstable again, but because we already have enough
ARM bots elsewhere, she disabled them for good.

You're exaggerating the effects of unstable bots as if people expected
them to be always perfect. I'd love if they could be, but I don't
expect them to be.

> I'm looking at the existing behavior of the community - if people are
> generally ignoring the result of a bot anyway (& if it's red for
weeks at a
> time, I think they are) then the notifications are providing no value.
I'm not seeing that, myself. So far, you're the only one that is
shouting out loud that this or that bot is noisy.

Sometimes people ignore bots, but I don't take this as a sign that
everything is doomed, just that people focus on different things at
different times.

>> No user is building trunk every commit (ish). Buildbots are not meant
>> to be as stable as a user (including distros) would require.
>
> I disagree with this - I think it's a worthy goal to have continuous
> validation that is more robust and comprehensive.
A worthy goal, yes. Doable right now, with the resources that we have,
no. And no amount of shouting will get this done.

If we want quality, we need top-level management, preferably from the
LLVM Foundation, and a bunch of dedicated people working on it, which
could be either funded by the foundation or agreed between the
interested parties. If anyone ever get this conversation going (I
tried), please let me know, as I'm very interested in making that
happen.

> red->exception->red I don't mind too much - the
"timeout->timeout" example
> you gave is one I disagree with.
Ah, yes. I mixed them up.

>> I agree in principle. I just worry that it's a lot easier to add an
>> XFAIL than to remove it later.
>
> How so? If you're actively investigating the issue, and everyone else
is
> happily ignoring the bot result (& so won't care when it goes
green, or red
> again) - you're owning the issue to get your bot back to green, and it
just
> means you have to un-XFAIL it as soon as that happens.
>From my experience, companies put people to work on open sourceprojects when they need something done and don't want to bear the
costs of maintaining it later.

So, initially, developers have a high pressure of pushing their
patches through, and you see them very excited in addressing the
review comments, adding tests, fixing bugs.

But once the patch is in, the priority of that task, for that company,
is greatly reduced. Most developers consider investigating an XFAIL
from their commit as important as the commit itself, but not
necessarily most companies do so with the same passion.

Moreover, once developers implement whatever they needed here, it's
not uncommon for their parent companies to move them away from the
project, in which case they can't even contribute any more due to
license issues, etc.

But we also have the not-so-responsible developers, that could create
a bug, assign to themselves, and never look back unless someone
complains.

That's why, at Linaro, I have the policy to only mark XFAIL when I can
guarantee that it's either not supposed to work or the developer will
fix it *before* marking the task closed.

cheers,
--rento

James Y Knight via llvm-dev

2015-Oct-07 14:39 UTC

head link

[llvm-dev] Buildbot Noise

On Oct 7, 2015, at 6:10 AM, Renato Golin via llvm-dev <llvm-dev at
lists.llvm.org> wrote:>> I'm looking at the existing behavior of the community - if people
are
>> generally ignoring the result of a bot anyway (& if it's red
for weeks at a
>> time, I think they are) then the notifications are providing no value.
> 
> I'm not seeing that, myself. So far, you're the only one that is
> shouting out loud that this or that bot is noisy.

I pointed this out as a major issue when I first started with LLVM. But since
nobody actually seemed interested in fixing it, I didn't keep making noise
about it. I basically just ignore the failure notices from buildbot, because
every commit seems to trigger multiple bogus failure notices, no matter what.

It's just so much easier to ignore those notices than to get involved in a
debate about why sending out useless failure notices that have nothing to do
with any of the commits being blamed is actively harmful -- *worse* than
useless.

I don't know what the solution is, but it's got to somehow move towards
trying to avoid blaming committers for already-known problems, or for
infrastructure issues (e.g.: svn update failed? Why do I care?). It simply does
not help to improve the quality of LLVM to have the buildbots send emails to
committers of arbitrary patches when a bot that "everyone" already
knows is flaky has failed yet again. *I* don't know which bots are
"supposed" to be flaky, so if I actually bothered to fully investigate
every such notice, that'd just be a massive waste of effort.

So I just ignore the notices. Sorry...

Renato Golin via llvm-dev

2015-Oct-07 14:45 UTC

head link

[llvm-dev] Buildbot Noise

On 7 October 2015 at 15:39, James Y Knight <jyknight at google.com>
wrote:> But since nobody actually seemed interested in fixing it, I didn't keep
making noise about it. I basically just ignore the failure notices from
buildbot, because every commit seems to trigger multiple bogus failure notices,
no matter what.
That's not true, either.

We (buildbot owners and admins) are constantly improving the noise by
adding more boards, investigating stability issues and disabling bots
temporarily when they're too noisy. We may not do it at the speed some
people expect, or to the extent that a fully supported validation team
in a big company would, but we do the best we can.

> I don't know what the solution is, but it's got to somehow move
towards trying to avoid blaming committers for already-known problems, or for
infrastructure issues (e.g.: svn update failed? Why do I care?). It simply does
not help to improve the quality of LLVM to have the buildbots send emails to
committers of arbitrary patches when a bot that "everyone" already
knows is flaky has failed yet again. *I* don't know which bots are
"supposed" to be flaky, so if I actually bothered to fully investigate
every such notice, that'd just be a massive waste of effort.
The alternative is worse: not testing.

The assumption is wrong: people *do* care, but the problem is harder
than it looks, and needs more than just the bot owner to improve.

I wish I had a magic wand... but I'm not expecting to ever have one.

cheers,
--renato

David Blaikie via llvm-dev

2015-Oct-09 16:28 UTC

head link

[llvm-dev] Buildbot Noise

On Wed, Oct 7, 2015 at 3:10 AM, Renato Golin <renato.golin at linaro.org>
wrote:
> Hi David,
>
> I think we're repeating ourselves here, so I'll reduce to the bare
> minimum before replying.
>
>
> On 6 October 2015 at 21:40, David Blaikie <dblaikie at gmail.com>
wrote:
> > When I suggest someone disable notifications from a bot it's
because
> those
> > notifications aren't actionable to those receiving them.
>
> This is a very limited view of the utility of buildbots.
>
> I think part of the problem is that you're expecting to get instant
> value out of something that cannot provide that to you. If you can't
> extract value from it, it's worthless.
>
Not worthless - just not valuable (negative value, actually - because it
drowns out other signals) to me, and, by extension, probably people like me
(I assume most other random contributors).

> Also, it seems, you're associating community buildbots with company
> testing infrastructure.

Sorry if it came off that way - I'm not sure what I said to imply that.

> When I worked at big companies, there were
> validation teams that would test my stuff and deal with *any* noise on
> their own, and only the real signal would come to me: 100% actionable.
> However, most of the bot owners in open source communities do this as
> a secondary task.

Sure - I own the GDB 7.5 buildbot in this fashion currently.

> This has always been the case and until someone
> (LLVM Foundation?) starts investing in a better infrastructure overall
> (multi master, new slaves, admins), there isn't much we can do to
> improve that quick enough.
>
> The alternative is that the less common architectures will always have
> noisier bots because less people use them day-to-day, during their
> development time.

They will have more to say - but that shouldn't mean low signal/noise. But,
yes, if developers aren't acting on the results because they don't know
how
to (they can't reproduce the issue without hardware, for example) then
it's
not a useful signal for them.

> By having a hard line on those, in the long run,
> means we'll disable most testing on all secondary architectures,

Again, I'm not suggesting disabling testing. I'm suggesting notifying
those
people who are the ones generally taking action on these results. Which, it
sounds like, are you & other people with a specific vested interest in that
platform.

Am I misunderstanding that? Are many of the unique failures reported by
these bots directly acted on by community members who committed the change?

> and
> LLVM becomes an Intel compiler. But many companies use LLVM for their
> production compiler on their own targets, so the inevitable is that
> they will *fork* LLVM. I don't think anyone wants that.
>
I'm not sure how that follows for a variety of reasons. (for one: those
people forking would need to own/maintain/triage/filter their own test
infrastructure results - so if they can do that internally, they can do
that externally)

> > I'm not suggesting removing the testing. Merely placing the onus
on
> > responding to/investigating notifications on the parties with the
> context to
> > do so.
>
> You still don't get the point. This would make sense on a world where
> all parties are equal.
>
> Most people develop and test on x86, even ARM and MIPS engineers. That
> means x86 is almost always stable, no matter who's working.
>
> But some bugs that we had to fix this year show up randomly *only* on
> ARM. That was a serious misuse of the Itanium C++ ABI, and one that
> took a long time to be fixed, and we still don't know if we got them
> all.
>
> Bugs like that normally only show up on self-hosting builds, sometimes
> on self-hosted Clang compiled test-suite. These bugs have no hard
> good/bad line for bisecting, they take hours per cycle, and they may
> or may not fail, so automated bisecting won't work. Furthermore, there
> is nothing to XFAIL in this case, unless you want to disable building
> Clang, which I don't think you do.
>
If it's broken and continues to be broken, what's the point in
continuing
to build it? If that step could be XFAIL'd while you or whomever was
investigating such a failure continues to investigate, that seems like
exactly the right behavior.

> While it's taking days, if not weeks, to investigate this bot, the
> status may be going from red to green to red.

You mean this is a flaky failure - an inconsistent reproduction?

> It would be very
> simplistic to assume that *any* greed->red transition while I'm
> bisecting the problem will be due to the current known instability. It
> could be anything, and developers still need to be warned if the alarm
> goes off.
>
Do they? If the odds are reasonable that it isn't their problem, telling
them about it is hurting the project in other ways (& yes, not telling them
about real problems is hurting the project too, I realize).

What I don't want is to distribute the cost of that unreliable reproduction
to everyone in the community (not even the original committer, since we
don't know who that is, but everyone in the community who commits code). I
would like flaky issues specific to a certain architecture/hardware/bot to
be owned by that bot owner - since it's likely that no one else can really
investigate them effectively.

And the cost of that is that the bot owner then also has to manually triage
failures during that period - I don't find this to be a high cost and I do
this for the GDB bot anyway (I get mail on every failure - I inspect them
all, and if they look like a genuine true positive due to debug info, I
follow up with the commit and check that the author is aware of it - if my
bot stopped sending blame mail today (which wouldn't be an unreasonable
request - it has a moderately high flake rate with GDB tests timing out) it
wouldn't cost me anything - I'm already doing what's necessary
there)

> The result may be it's still flaky, the developer can't do much,
life
> goes on. Or it could be his test, he fixes immediately, and I'm
> eternally grateful, because I still need to investigate *only one* bug
> at a time. By silencing the bot, I'd have to be responsible for
> debugging the original hard problem plus any other that would come
> while the bot was flaky.
>
No, you wouldn't - you'd be responsible for a first level triage, which,
if
the flake rate is moderate, you've already imposed on many more people by
asking them to triage these flakes rather than you - and they have less
context (they don't know what current flaky issues there are, or how they
manifest - so the community as a whole may end up investing several
duplicate efforts to understand that flake as people assume that it's their
fault - or they don't, and they get in the habit of ignoring buildbot
results (from your builder - even when the flakes have gone away: this
hurts you, or from all builders - this hurts the community as a whole))

> Now, there's the issue of where does the responsibility lies...
>
> I'm responsible for the quality of the ARM code, including the
> buildbots. What you're suggesting is that *no matter what* gets
> committed, it is *my* responsibility to fix any bug that the original
> developers can't *action upon*.
>
No - it is your responsibility to get them to a place where they can act
upon it.

> That might seem sensible at first, but the biggest problem here is the
> term that you're using over and over again: *acting upon*. It can be a
> technical limitation that you can't act upon a bug on an ARM bot, but
> it can also be a personal one. I'm not saying *you* would do that, but
> we have plenty of people in the community with plenty of their own
> problems. You said it yourself, people tend to ignore problems that
> they can't understand, but not understanding is *not* the same as not
> being able to *act upon*.
>
> For me, that attitude is what's at the core of the problem here. By
> raising the bar faster than we can make it better, you're essentially
> just giving people the right not to care.

They already have that right - and I believe are exercising it regularly.
Is your experience different? Are most (>90%) of the uniquely reported
issues on your buildbot addressed by the original committer without your
assistance?

> The bar will be raised even
> further by peer pressure, and that's the kind of behaviour that leads
> to a fork. I'm trying to avoid this at all costs.
>
> > All I'd expect is that you/others watch the negative
> > bot results, and forward any on that look like actionable true
> positives. If
> > that's too expensive, then I don't know how you can expect
community
> members
> > to incur that cost instead of bot owners?
>
> Another example of the assumption that bot owners are validation
> engineers and that's their only job.

No, that's not my assumption - my assumption is that bot owners have the
most context for their bot & can provide triage (knowing the common flakes,
failure modes, false positives, etc - because they see them all, rather
than once every few months), reproduction steps, logs, etc.

> It was never like this in LLVM
> and it won't start today just because we want to.
>
It might if that's the only way to contribute a bot - contributing a bot
without an owner that can triage/repro/etc the results seems not good to
me. If the owner can't do it, why would the expect the community to do that
for them?

> My expectation of the LLVM Foundation is that they would take our
> validation infrastructure to the next level, but so far I haven't seen
> much happening. If you want to make it better, instead of forcing your
> way on the existing scenario, why not work with the Foundation to move
> this to the next level?
>
Because this is easier for me, potentially. I want to
express/encourage/drive towards the appropriate signal/noise for bots so
that new contributors (and old) can trust the signal. Then it's up to those
who want to contribute bots to meet that bar. Then owners, presented with
the real cost of the bots falling to them (which I still think isn't that
high - like I said, I triage every GDB 7.5 failure myself already today -
that's the right tradeoff for me, if the issues were greater I would invest
some time in making it more automated so I wouldn't feel a need to do that
(but I know, for example, that the Apple engineers can't look at/run the
test cases from the suite, so it needs some help)), can decide whether
paying the time/effort/engineering to improve the situation is worthwhile -
distributing and duplicating the cost across the project as it is today
makes it hard to see the cost and justify the improvement. Each individual
decides it's not worth their time to try to push against the status quo or
do the work, so nothing happens.

>
>
> > Once people lose
> > confidence in the bots, they're not likely to /gain/ confidence
again -
>
> That's not true. Galina's Panda bots were unstable in 2010, people
> lost confidence, she added more boards, people re-gained confidence in
> 2011.

What behavior did you observe that represents the loss/gain in confidence?

> Then it became unstable in 2013, people lost confidence, we
> fixed the issues, people re-gained confidence only a few months later.
> This year it got unstable again, but because we already have enough
> ARM bots elsewhere, she disabled them for good.
>
> You're exaggerating the effects of unstable bots as if people expected
> them to be always perfect. I'd love if they could be, but I don't
> expect them to be.
>
Not perfect, but a fair bit better than they are - my understanding,
anecdotal as it is, is that a lot of people ignore a lot of bots. That's
not good.

> > I'm looking at the existing behavior of the community - if people
are
> > generally ignoring the result of a bot anyway (& if it's red
for weeks
> at a
> > time, I think they are) then the notifications are providing no value.
>
> I'm not seeing that, myself. So far, you're the only one that is
> shouting out loud that this or that bot is noisy.
>
Yep, most people just seem to ignore them except for a handful or when they
all explode together (oh, look, I checked in a -Asserts -Wunused-variable
build break).

> Sometimes people ignore bots,

Sometimes? It's pretty much impossible to commit without getting at least a
couple of fail-mails. Seems like that's a lot of mail to ignore (especially
from builders that build with coarser granularity than
one-build-per-commit, which is most of them).

> but I don't take this as a sign that
> everything is doomed, just that people focus on different things at
> different times.
>
I don't think everything is doomed - clearly we've survived, released,
etc,
in this state for a while. I take it as a sign that the system is
sub-optimal and I'm trying to make some noise about it where other people
resign themselves to auto-bining most of the buildbot results. (see James's
reply for another example of someone trying to bring this issue up, failing
to get anyone to listen, and giving up/resigning himself to the same
outcome)

>
>
> >> No user is building trunk every commit (ish). Buildbots are not
meant
> >> to be as stable as a user (including distros) would require.
> >
> > I disagree with this - I think it's a worthy goal to have
continuous
> > validation that is more robust and comprehensive.
>
> A worthy goal, yes. Doable right now, with the resources that we have,
> no. And no amount of shouting will get this done.
>
I'm not shouting at anyone. (& to a degree I think it is achievable, if
we
consider the costs in the right places - but my suggestion here isn't even
necessarily to improve/create that world)

> If we want quality, we need top-level management, preferably from the
> LLVM Foundation, and a bunch of dedicated people working on it, which
> could be either funded by the foundation or agreed between the
> interested parties. If anyone ever get this conversation going (I
> tried), please let me know, as I'm very interested in making that
> happen.
>
I don't think it requires top-level management changes to make changes
here. All software is hard & we all work to make different bits of it
better, I don't see why that wouldn't be the case here.

>
>
> > red->exception->red I don't mind too much - the
"timeout->timeout"
> example
> > you gave is one I disagree with.
>
> Ah, yes. I mixed them up.
>
>
> >> I agree in principle. I just worry that it's a lot easier to
add an
> >> XFAIL than to remove it later.
> >
> > How so? If you're actively investigating the issue, and everyone
else is
> > happily ignoring the bot result (& so won't care when it goes
green, or
> red
> > again) - you're owning the issue to get your bot back to green,
and it
> just
> > means you have to un-XFAIL it as soon as that happens.
>
> From my experience, companies put people to work on open source
> projects when they need something done and don't want to bear the
> costs of maintaining it later.
>
> So, initially, developers have a high pressure of pushing their
> patches through, and you see them very excited in addressing the
> review comments, adding tests, fixing bugs.
>
> But once the patch is in, the priority of that task, for that company,
> is greatly reduced. Most developers consider investigating an XFAIL
> from their commit as important as the commit itself, but not
> necessarily most companies do so with the same passion.
>
I'm not necessarily suggesting that the developer put the XFAIL in, but you
could. You've said you are investigating issues -  so while you are
actively investigating the issue, it is expected to fail.

> Moreover, once developers implement whatever they needed here, it's
> not uncommon for their parent companies to move them away from the
> project, in which case they can't even contribute any more due to
> license issues, etc.
>
> But we also have the not-so-responsible developers, that could create
> a bug, assign to themselves, and never look back unless someone
> complains.
>
> That's why, at Linaro, I have the policy to only mark XFAIL when I can
> guarantee that it's either not supposed to work or the developer will
> fix it *before* marking the task closed.

Sure - I'm talking about the current situation that seems to be the process
you're working with. Failure happens on your bot, you investigate (possibly
for hours, days, or weeks) and eventually fix it. I'm suggesting that the
correct thing to do is to XFAIL that test (or revert the commit) as soon as
you start investigating it (I'm happy enough with a fix in the order of
minutes (Oh, look, I used named LLVM IR values, so the test fails on
-Asserts - quick fix, no need to XFAIL/revert, then fix, etc) but anything
longer than that it seems worth the minute or two it would take to get the
build green again)

This is good for you and the project as a whole - in this way, future
failures will actually send mail (the bot will go red again) rather than
hiding all the issues until the issue is fixed hours/days/weeks later.

- David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151009/1f7f4bbe/attachment.html>

Renato Golin via llvm-dev

2015-Oct-09 17:14 UTC

head link

[llvm-dev] Buildbot Noise

I think we've hit a record in the number of inline replies, here... :)

Let's start fresh...

    Problem #1: What is flaky?

The types of failures of a buildbot:

1. failures because of bad hardware / bad software / bad admin
(timeout, disk full, crash, bad RAM)
2. failures because of infrastructure problems (svn, lnt, etc)
3. failures due to previous or external commits unrelated to the blame
list (intermittent, timeout)
4. results that you don't know how to act on, but you have to
5. clear error messages, easy to act on

In my view, "flaky" is *only* number 1. Everything else is signal.

I agree that bots that cause 1. should be silent, and that failures in
2. and 3. should be only emailed to the bot admin. But category 4
still needs to email the blame list and cannot be ignored, even if
*you* don't know how to act on.

Type 2. can easily be separated, but I'm yet to see how are we going
to code in which category each failure lies for types 3. and 4. One
way to work around the problem in 4 is to print the bot owner's name
on the email, so that you know who to reply to, for more details on
what to do. How to decide if your change is unrelated or you didn't
understand is a big problem. Once all bots are low-noise, people will
tend more to 4, until then, to 3 or 1.

In agreement?


    Problem #2: Breakage types

Bots can break for a number of reasons in category 4. Some examples:

A. silly, quick fixed ones, like bad CHECK lines, missing explicit
triple, move tests to target-specific directories, add an include
file.
B. real problems, like an assert in the code, seg fault, bad test results.
C. hard problems, like bad codegen affecting self-hosting,
intermittent failures in test-suite or self-hosted clang.

Problems of type A. tend to show by the firehose on ARM, while they're
a lot less common on x86_64 bots just because people develop on
x86_64. Problems B. and C. and equally common on all platforms due to
the complexity of the compiler.

Problems of type B. should have same behaviour in all platforms. If
the bots are fast enough (either fast hardware, or many hardware), the
blame list should be small and bisect should be quick (<1day). These
are not the problem.

Problems of type C, however, are seriously worse on slow targets. Not
only it's slower to build (sometimes 10x slower than on a decent
server), but the testing is hard to get right (because it's
intermittent), and until you get it right, you're actively working on
that (minus sleep time, etc). Since we're talking about an order of
magnitude slower to debug, sleep time becomes a much bigger issue. If
a hard problem takes about 5 hours on fast hardware, it can take up to
50 hours, and in that case, no one can work that long. If you do 10hs
straight every day, it's still a week past.

In agreement?


I'll continue later, once we're in agreement over the base facts.

cheers,
--renato

llvm dev - Oct 2015 - Buildbot Noise

[llvm-dev] Buildbot Noise

[llvm-dev] Buildbot Noise

[llvm-dev] Buildbot Noise

[llvm-dev] Buildbot Noise

[llvm-dev] Buildbot Noise