thr3ads.net - llvm dev - [LLVMdev] [cfe-dev] LLVM IRC channel flooded? [May 2015]

If this information is useful, please help other people find it:
Share via:

Philip Reames

2015-May-21 00:52 UTC

[LLVMdev] [cfe-dev] LLVM IRC channel flooded?

On 05/20/2015 11:04 AM, Renato Golin wrote:> On 20 May 2015 at 18:47, Philip Reames <listmail at philipreames.com>
wrote:
>> One particular irritant is getting emails 12-24 hours later about
someone else's
>> breakage that has *already been fixed*.  The long cycling bots are
really
>> irritating in that respect.
> That's not that easy to fix, and I think we'll have to cope with
that
> forever. Not all machines are fast, and some buildbots do a full
> self-host, with compiler-rt and running all tests. Others do a full
> benchmark run of LNT, running it 5-8 times, which can take several
> hours on an ARM box.I agree it's not easy, but it's not something we should just live with 
either. There are ways to address the problem and we should consider them.

As a randomly chosen example, one thing we could do would be to have the 
notion of a "last good commit".  Fast builders would cycle off ToT, if
any one (or some subset) passed, that advances the notion of last good 
commit.  Slower builders should cycle off the last good commit, not 
ToT.  We have all the mechanisms to implement this today.  It could be 
as simple as parsing the JSON output of buildbot in the script that runs 
the slower build bots and sync to a particular revision rather than
ToT.>
> The benchmark bots should be marked not to spam, since they're not
> there to pick up errors, but the full self-hosting ones do need to
> warn on errors. For example, right now I have a bug only on a thumbv7a
> self-hosting bot, and not on others. I'm now bisecting it to find the
> culprit, but this is not always clear, as the longer it takes for me
> to realise, the harder it will be to fix it.At this point, you're long past the point I was grossing about.  I'm not
arguing that long running bots shouldn't notify; I'm arguing they 
shouldn't report *obvious* false positives.

Also, the bisect step really should be automated... :)>
> The only way out of it is for people to look at the fast bots, and if
> they're fixed, check the commit that did it and see if the slow bot
> has been fixed by the same commit later.You've now wasted 10 minutes or more my time per slow noisy bot. When I 
routinely get 10+ builder failure emails for changes that are clean, 
that's not worthwhile investment.>
> Buildbot owners will eventually pick those problems up, but as I said,
> the longer it takes, the harder it is to get to the bottom of it, and
> the higher is the probability of getting more regressions introduced
> because the bot is red and won't warn.I agree.  All I'm suggesting is reducing noise so that real failures are 
likely to be noticed quickly.>
> cheers,
> --renato

Renato Golin

2015-May-21 09:05 UTC

head link

[LLVMdev] [cfe-dev] LLVM IRC channel flooded?

On 21 May 2015 at 01:52, Philip Reames <listmail at philipreames.com>
wrote:> As a randomly chosen example, one thing we could do would be to have the
> notion of a "last good commit".  Fast builders would cycle off
ToT, if any
> one (or some subset) passed, that advances the notion of last good commit.
> Slower builders should cycle off the last good commit, not ToT.  We have
all
> the mechanisms to implement this today.  It could be as simple as parsing
> the JSON output of buildbot in the script that runs the slower build bots
> and sync to a particular revision rather than ToT.
Not all slow builders have the same sources as the fast builders. For
example, our "full" builders consider compiler-rt, while the fast ones
don't.

> At this point, you're long past the point I was grossing about. 
I'm not
> arguing that long running bots shouldn't notify; I'm arguing they
shouldn't
> report *obvious* false positives.
Well, that's yet another fix we need for all builders. I think we're
missing:

1. Detection of infrastructure vs. real code problems. There isn't a
simple way of doing this, so just adding patterns to "infrastructure"
problems being ignored, everything else is an error, would be ok.

2. Detection of different failures. If new tests fail, or the build
fail instead of tests, the bot should email *again*. This is very
problematic and why we're so angry towards broken bots.

3. Detection of long running failures, that might have been forgotten.
No emails to the blame list, but an email to the bot owner would help.

> Also, the bisect step really should be automated... :)
It's not always simple, especially when self-hosting. If each step
takes 7 hours, guessing what the output is and waiting 7 days to
realise it wasn't is not a good use of resources. For those cases I
always bisect manually.

> You've now wasted 10 minutes or more my time per slow noisy bot. When I
> routinely get 10+ builder failure emails for changes that are clean,
that's
> not worthwhile investment.
I know. That's why I do that on my own bots. It's my time to spend.

Maybe we should divide the bots into three categories. Fast, Slow and
Experimental.

Fast bots are everyone's responsibility. Slow bots are the bot
owners'. Experimental can safely be ignored. That's pretty much what I
do now with my NOC page.

As a bot owner, if I want to reduce my time spend on slow bots, I'll
have to work hard to make it fast, and not transfer the burden to the
rest of the community.

cheers,
--renato

Philip Reames

2015-May-21 17:27 UTC

head link

[LLVMdev] [cfe-dev] LLVM IRC channel flooded?

On 05/21/2015 02:05 AM, Renato Golin wrote:> On 21 May 2015 at 01:52, Philip Reames <listmail at philipreames.com>
wrote:
>> As a randomly chosen example, one thing we could do would be to have
the
>> notion of a "last good commit".  Fast builders would cycle
off ToT, if any
>> one (or some subset) passed, that advances the notion of last good
commit.
>> Slower builders should cycle off the last good commit, not ToT.  We
have all
>> the mechanisms to implement this today.  It could be as simple as
parsing
>> the JSON output of buildbot in the script that runs the slower build
bots
>> and sync to a particular revision rather than ToT.
> Not all slow builders have the same sources as the fast builders. For
> example, our "full" builders consider compiler-rt, while the fast
ones
> don't.
>
>
>> At this point, you're long past the point I was grossing about. 
I'm not
>> arguing that long running bots shouldn't notify; I'm arguing
they shouldn't
>> report *obvious* false positives.
> Well, that's yet another fix we need for all builders. I think
we're missing:
>
> 1. Detection of infrastructure vs. real code problems. There isn't a
> simple way of doing this, so just adding patterns to
"infrastructure"
> problems being ignored, everything else is an error, would be ok.
>
> 2. Detection of different failures. If new tests fail, or the build
> fail instead of tests, the bot should email *again*. This is very
> problematic and why we're so angry towards broken bots.
>
> 3. Detection of long running failures, that might have been forgotten.
> No emails to the blame list, but an email to the bot owner would help.
>
>
>> Also, the bisect step really should be automated... :)
> It's not always simple, especially when self-hosting. If each step
> takes 7 hours, guessing what the output is and waiting 7 days to
> realise it wasn't is not a good use of resources. For those cases I
> always bisect manually.
>
>
>> You've now wasted 10 minutes or more my time per slow noisy bot.
When I
>> routinely get 10+ builder failure emails for changes that are clean,
that's
>> not worthwhile investment.
> I know. That's why I do that on my own bots. It's my time to spend.
>
> Maybe we should divide the bots into three categories. Fast, Slow and
> Experimental.
>
> Fast bots are everyone's responsibility. Slow bots are the bot
> owners'. Experimental can safely be ignored. That's pretty much
what I
> do now with my NOC page.
>
> As a bot owner, if I want to reduce my time spend on slow bots, I'll
> have to work hard to make it fast, and not transfer the burden to the
> rest of the community.
+ 1.  I would be in full support of such a proposal.>
> cheers,
> --renato

llvm dev - May 2015 - [LLVMdev] [cfe-dev] LLVM IRC channel flooded?

[LLVMdev] [cfe-dev] LLVM IRC channel flooded?

[LLVMdev] [cfe-dev] LLVM IRC channel flooded?

[LLVMdev] [cfe-dev] LLVM IRC channel flooded?