thr3ads.net - llvm dev - [llvm-dev] Buildbot Noise [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Chris Matthews via llvm-dev

2015-Oct-07 22:58 UTC

[llvm-dev] Buildbot Noise

> On Oct 7, 2015, at 3:28 PM, Renato Golin <renato.golin at linaro.org>
wrote:
> 
> On 7 October 2015 at 23:20, Chris Matthews <chris.matthews at
apple.com> wrote:
>> For instance, a build that fails with a ninja error, will say so, same
with a
>> svn failure or a Jenkins exception.
> 
> Will these get mailed to developers? Or admins?
Unfortunately, Jenkins does not let me determine who to email based on the
failure cause. That would be wonderful! The detected problem is right at the top
of the email though, so at least you don’t have to click the link.  For
infrastructure problems sometimes we add helpful messages, for instance we had
an issue with about 1 in 20 builds failing with a “killed -9” message, when that
happened we could just say sorry an print a link to the bug.  Mostly is is just
a nice fast link right to the test case failure or build failure.
> 
> 
>> We also have a few polices on email: only email on first failure, don’t
>> email on exception and abort, and don’t email long blame lists (more
than 10
>> people).
> 
> That's the same. The only problem we found is that exception is
> treated as "success" because we don't want to email when the
master is
> reloaded. But the sequence red->exception->red emails, since
> exception->red is treated as good->bad.
> 
Yep, that has to be changed.  It is not a useful state change
> 
>> In all our CI cluster we use phased builds.
> 
> I'd love to have that! :D
> 
> 
>> I do think flaky bots should only email their owners.
> 
> I agree. The problem is defining flaky. A lot of flaky behaviour can
> be mapped back to the compiler (like Clang abusing of the C++ ABI, or
> code assuming 64-bit types in odd ways).
I define flaky as a build that fails for a reason unrelated to the code on the
blame list.
> 
> 
>> I also think we
>> should nominate some reliable fast builds to produce vetted revision,
and
>> trigger most other builds from those.
> 
> This would be the perfect world.
We should move towards getting this setup then.  There is some code that needs
to be setup in buildbot, as well as an agreement on what gets attached to what.
> 
> When Apple moved to GreenBots, I was expecting that we'd be moving too
> not long after. I was also expecting the LLVM Foundation to be driving
> this change, and I'd have dived head first to have what you have.
> 
> But it makes no sense for me to do that on my own, locally. Nor I have
> bandwidth or resources to do that for everyone else.
> 
> I don't really care if it's buildbot, Jenkins or orc slaves, as
long
> as I can spend my time doing something else, I'm happy.
> 
> cheers,
> --renato

Renato Golin via llvm-dev

2015-Oct-07 23:23 UTC

head link

[llvm-dev] Buildbot Noise

On 7 October 2015 at 23:58, Chris Matthews <chris.matthews at apple.com>
wrote:> Unfortunately, Jenkins does not let me determine who to email based on the
failure cause. That would be wonderful! The detected problem is right at the top
of the email though, so at least you don’t have to click the link.
Yes, I've seen this, and I loved it.

>  For infrastructure problems sometimes we add helpful messages, for
instance we had an issue with about 1 in 20 builds failing with a “killed -9”
message, when that happened we could just say sorry an print a link to the bug. 
Mostly is is just a nice fast link right to the test case failure or build
failure.
Indeed.

> I define flaky as a build that fails for a reason unrelated to the code on
the blame list.
Me too, but latent *code* problems introduced before this build need
to show up somewhere, and it's non-trivial to detect those kind of
errors based on the blame list alone. An example of this is the C++
ABI bugs.

They're rare enough that we can deal with them. But they look *a lot*
like flaky bots, enough that could trigger people to disable bots
without further consideration.

> We should move towards getting this setup then.  There is some code that
needs to be setup in buildbot, as well as an agreement on what gets attached to
what.
Excellent! I have some spare hardware that I can use for experimental builders.

But, this being Jenkins, I suspect you'll need to push the jobs
through, rather than me pulling them like buildbots. For that, I'll
need some firewall configurations.

Just let me know when you're ready! :)

cheers,
--renato

Chris Matthews via llvm-dev

2015-Oct-07 23:52 UTC

head link

[llvm-dev] Buildbot Noise

> On Oct 7, 2015, at 4:23 PM, Renato Golin <renato.golin at linaro.org>
wrote:
> 
> On 7 October 2015 at 23:58, Chris Matthews <chris.matthews at
apple.com> wrote:
>> Unfortunately, Jenkins does not let me determine who to email based on
the failure cause. That would be wonderful! The detected problem is right at the
top of the email though, so at least you don’t have to click the link.
> 
> Yes, I've seen this, and I loved it.
> 
> 
>> For infrastructure problems sometimes we add helpful messages, for
instance we had an issue with about 1 in 20 builds failing with a “killed -9”
message, when that happened we could just say sorry an print a link to the bug. 
Mostly is is just a nice fast link right to the test case failure or build
failure.
> 
> Indeed.
> 
> 
>> I define flaky as a build that fails for a reason unrelated to the code
on the blame list.
> 
> Me too, but latent *code* problems introduced before this build need
> to show up somewhere, and it's non-trivial to detect those kind of
> errors based on the blame list alone. An example of this is the C++
> ABI bugs.
> 
> They're rare enough that we can deal with them. But they look *a lot*
> like flaky bots, enough that could trigger people to disable bots
> without further consideration.
> 
> 
>> We should move towards getting this setup then.  There is some code
that needs to be setup in buildbot, as well as an agreement on what gets
attached to what.
> 
> Excellent! I have some spare hardware that I can use for experimental
builders.
> 
> But, this being Jenkins, I suspect you'll need to push the jobs
> through, rather than me pulling them like buildbots. For that, I'll
> need some firewall configurations.
This can be done with buildbot or Jenkins. Both platforms support it.   It is a
huge amount of work to port jobs to Jenkins, so that is not to be taken lightly.
This might be a good discussion to have at the conference, either way there is
going to be some work to change configurations and link things up.

A simple buildbot specific fix is to just switch some of our bots to manual only
launches. Then use curl commands at the end of other builds to tigger them.   
That is even possible to do between Jenkins and buildbot.  That might be a quick
and dirty way to phase some builds.
> 
> Just let me know when you're ready! :)
> 
> cheers,
> --renato

llvm dev - Oct 2015 - Buildbot Noise

[llvm-dev] Buildbot Noise

[llvm-dev] Buildbot Noise

[llvm-dev] Buildbot Noise