> On Oct 7, 2015, at 3:28 PM, Renato Golin <renato.golin at linaro.org> wrote: > > On 7 October 2015 at 23:20, Chris Matthews <chris.matthews at apple.com> wrote: >> For instance, a build that fails with a ninja error, will say so, same with a >> svn failure or a Jenkins exception. > > Will these get mailed to developers? Or admins?Unfortunately, Jenkins does not let me determine who to email based on the failure cause. That would be wonderful! The detected problem is right at the top of the email though, so at least you don’t have to click the link. For infrastructure problems sometimes we add helpful messages, for instance we had an issue with about 1 in 20 builds failing with a “killed -9” message, when that happened we could just say sorry an print a link to the bug. Mostly is is just a nice fast link right to the test case failure or build failure.> > >> We also have a few polices on email: only email on first failure, don’t >> email on exception and abort, and don’t email long blame lists (more than 10 >> people). > > That's the same. The only problem we found is that exception is > treated as "success" because we don't want to email when the master is > reloaded. But the sequence red->exception->red emails, since > exception->red is treated as good->bad. >Yep, that has to be changed. It is not a useful state change> >> In all our CI cluster we use phased builds. > > I'd love to have that! :D > > >> I do think flaky bots should only email their owners. > > I agree. The problem is defining flaky. A lot of flaky behaviour can > be mapped back to the compiler (like Clang abusing of the C++ ABI, or > code assuming 64-bit types in odd ways).I define flaky as a build that fails for a reason unrelated to the code on the blame list.> > >> I also think we >> should nominate some reliable fast builds to produce vetted revision, and >> trigger most other builds from those. > > This would be the perfect world.We should move towards getting this setup then. There is some code that needs to be setup in buildbot, as well as an agreement on what gets attached to what.> > When Apple moved to GreenBots, I was expecting that we'd be moving too > not long after. I was also expecting the LLVM Foundation to be driving > this change, and I'd have dived head first to have what you have. > > But it makes no sense for me to do that on my own, locally. Nor I have > bandwidth or resources to do that for everyone else. > > I don't really care if it's buildbot, Jenkins or orc slaves, as long > as I can spend my time doing something else, I'm happy. > > cheers, > --renato
On 7 October 2015 at 23:58, Chris Matthews <chris.matthews at apple.com> wrote:> Unfortunately, Jenkins does not let me determine who to email based on the failure cause. That would be wonderful! The detected problem is right at the top of the email though, so at least you don’t have to click the link.Yes, I've seen this, and I loved it.> For infrastructure problems sometimes we add helpful messages, for instance we had an issue with about 1 in 20 builds failing with a “killed -9” message, when that happened we could just say sorry an print a link to the bug. Mostly is is just a nice fast link right to the test case failure or build failure.Indeed.> I define flaky as a build that fails for a reason unrelated to the code on the blame list.Me too, but latent *code* problems introduced before this build need to show up somewhere, and it's non-trivial to detect those kind of errors based on the blame list alone. An example of this is the C++ ABI bugs. They're rare enough that we can deal with them. But they look *a lot* like flaky bots, enough that could trigger people to disable bots without further consideration.> We should move towards getting this setup then. There is some code that needs to be setup in buildbot, as well as an agreement on what gets attached to what.Excellent! I have some spare hardware that I can use for experimental builders. But, this being Jenkins, I suspect you'll need to push the jobs through, rather than me pulling them like buildbots. For that, I'll need some firewall configurations. Just let me know when you're ready! :) cheers, --renato
> On Oct 7, 2015, at 4:23 PM, Renato Golin <renato.golin at linaro.org> wrote: > > On 7 October 2015 at 23:58, Chris Matthews <chris.matthews at apple.com> wrote: >> Unfortunately, Jenkins does not let me determine who to email based on the failure cause. That would be wonderful! The detected problem is right at the top of the email though, so at least you don’t have to click the link. > > Yes, I've seen this, and I loved it. > > >> For infrastructure problems sometimes we add helpful messages, for instance we had an issue with about 1 in 20 builds failing with a “killed -9” message, when that happened we could just say sorry an print a link to the bug. Mostly is is just a nice fast link right to the test case failure or build failure. > > Indeed. > > >> I define flaky as a build that fails for a reason unrelated to the code on the blame list. > > Me too, but latent *code* problems introduced before this build need > to show up somewhere, and it's non-trivial to detect those kind of > errors based on the blame list alone. An example of this is the C++ > ABI bugs. > > They're rare enough that we can deal with them. But they look *a lot* > like flaky bots, enough that could trigger people to disable bots > without further consideration. > > >> We should move towards getting this setup then. There is some code that needs to be setup in buildbot, as well as an agreement on what gets attached to what. > > Excellent! I have some spare hardware that I can use for experimental builders. > > But, this being Jenkins, I suspect you'll need to push the jobs > through, rather than me pulling them like buildbots. For that, I'll > need some firewall configurations.This can be done with buildbot or Jenkins. Both platforms support it. It is a huge amount of work to port jobs to Jenkins, so that is not to be taken lightly. This might be a good discussion to have at the conference, either way there is going to be some work to change configurations and link things up. A simple buildbot specific fix is to just switch some of our bots to manual only launches. Then use curl commands at the end of other builds to tigger them. That is even possible to do between Jenkins and buildbot. That might be a quick and dirty way to phase some builds.> > Just let me know when you're ready! :) > > cheers, > --renato