thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] RFC: End-to-end testing [Oct 2019]

If this information is useful, please help other people find it:
Share via:

David Greene via llvm-dev

2019-Oct-11 14:20 UTC

[llvm-dev] [cfe-dev] RFC: End-to-end testing

Renato Golin via cfe-dev <cfe-dev at lists.llvm.org> writes:
> On Thu, 10 Oct 2019 at 22:26, David Greene <dag at cray.com> wrote:
>> That would be a shame.  Where is test-suite run right now?  Are there
>> bots?  How are regressions reported?
>
> There is no shame in making the test-suite better.
That's not what I meant, sorry.  I mean it would be a shame not to be
able to put end-to-end tests next to the code they test.  Tests that are
separated from code either tend to not get written/committed or tend to
not get run pre-merge.
> We do have bots running them in full CI for multiple targets, yes.
> Regressions are reported and fixed. The benchmarks are also followed
> by a smaller crowd and regression on those are also fixed (but
> slower).
How are regressions reported?  On llvm-dev?
> I'm not proposing to move e2e off to a dark corner, I'm proposing
to
> have a scaled testing strategy that can ramp up and down as needed,
> without upsetting the delicate CI and developer balance.
I don't think I quiet understand what you mean.  CI is part of
development.  If tests break we have to fix them, regardless of where
the tests live.
> Sure, e2e tests are important, but they need to catch bugs that the
> other tests don't catch, not being our front-line safety net.
Of course.
> A few years back there was a big effort to clean up the LIT tests from
> duplicates and speed up inefficient code, and a lot of tests are
> removed. If we just add the e2e today and they never catch anything
> relevant, they'll be the next candidates to go.
I'm confused.  On the one hand you say you don't want to put e2e tests
in a dark corner, but here you speculate they could be easily removed.
Presumably a test was added because there was some failure that other
tests did not catch.  It's true that once a test is fixed it's very
likely it will never break again.  Is that a reason to remove tests?

If something end-to-end breaks in the field, it would be great if we
could capture a component-level test for it.  That would be my first
goal.  I agree it can be tempting to stop at an e2e test level and not
investigate further down.  We definitely want to avoid that.  My guess
is that over time we'll gain experience of what a good e2e test is for
the LLVM project.
> The delta that e2e can test is really important, but really small and
> fairly rare. So running it less frequent (every few dozen commits)
> will most likely be enough for anything we can possibly respond to
> upstream.
I think that's probably reasonable.
> Past experiences have, over and over, shown us that new shiny CI toys
> get rusty, noisy, and dumped.
I don't think e2e testing is shiny or new.  :)
> We want to have the tests, in a place anyone can test, that the bots
> *will* test periodically, and that don't annoy developers often enough
> to be a target.
What do you mean by "annoy?"  Taking too long to run?
> In a nutshell:
>  * We still need src2src tests, to ensure connection points (mainly
> IR) are canonical and generic, avoiding hidden contracts
Yes.
>  * We want the end2end tests to *add* coverage, not overlap with or
> replace existing tests
Yes, but I suspect people will disagree about what constitutes
"overlap."
>  * We don't want those tests to become a burden to developers by
> breaking on unrelated changes and making bots red for obscure reasons
Well, tests are going to break.  If a test is too fragile it should be
fixed or removed.
>  * We don't want them to be a burden to our CI efforts, slowing down
> regular LIT testing and becoming a target for removal
I certainly think less frequent running of tests could help with that if
it proves to be a burden.
> The orders of magnitude for number of commits we want to run tests are:
>  * LIT base, linker, compiler-RT, etc: ~1
>  * Test-suite correctness, end-2-end: ~10
>  * Multi-stage build, benchmarks: ~100
>
> We already have that ratio (somewhat) with buildbots, so it should be
> simple to add e2e to the test suite at the right scale.
Would it be possible to keep them in the monorepo but have bots that
exercise those tests at the test-suite frequency?  I suspect that if e2e
tests live in test-suite very few people will ever run them before
merging to master.
>> > The last thing we want is to create direct paths from front-ends
to
>> > back-ends and make LLVM IR transformation less flexible.
>>
>> I'm not sure I follow.  Can you explain this a bit?
>
> Right, I had written a long paragraph about it but deleted in the
> final version of my email. :)
>
> The main point is that we want to avoid hidden contracts between the
> front-end and the back-end.
>
> We want to make sure all front-ends can produce canonical IR, and that
> the middle-end can optimise the IR and that the back-end can lower
> that to asm in a way that runs correctly on the target. As we have
> multiple back-ends and are soon to have a second official front-end,
> we want to make sure we have good coverage on the multi-step tests
> (AST to IR, IR to asm, etc).
Absolutely.
> If we add e2e tests that are not covered by piece-wise tests, we risk
> losing that clarity.
Gotcha.
> I think e2e tests have to expose more complex issues, like front-end
> changes, pass manager order, optimisation levels, linking issues, etc.
> They can check for asm, run on the target, or both. In the test-suite
> we have more budget to do a more complete job at it than in LIT
> check-all.
Thanks for the explanation, that helped clarify things for me.

I still think the kinds of e2e tests I'm thinking of are much closer to
the existing LIT tests in the monorepo than things in test-suite.  I
expect them to be quite small.  They wouldn't necessarily need to run as
part of check-all (and indeed, I've been told that no one runs check-all
anyway because it's too fragile).

                          -David

Renato Golin via llvm-dev

2019-Oct-11 15:23 UTC

head link

[llvm-dev] [cfe-dev] RFC: End-to-end testing

Hi David,

You answer some of your own questions down below, so I'll try to
collate the responses and shorten my reply.

On Fri, 11 Oct 2019 at 15:20, David Greene <dag at cray.com>
wrote:> How are regressions reported?  On llvm-dev?
They're buildbots, exactly like any other. Direct email, llvm-commits,
irc, bugzilla. There is no distinction, broken bots need to be fixed.

llvm-dev is not the place to report bugs.
> I'm confused.  On the one hand you say you don't want to put e2e
tests
> in a dark corner, but here you speculate they could be easily removed.
> Presumably a test was added because there was some failure that other
> tests did not catch.  It's true that once a test is fixed it's very
> likely it will never break again.  Is that a reason to remove tests?
Sorry, my point is about the dynamics between number of tests, their
coverage, time to run, frequency of *unrelated* breakage, etc.

There are no set rules, but there is a back-pressure as developers and
bot owners tend to breakages.
> What do you mean by "annoy?"  Taking too long to run?
Tests that break more often are looked at more often, and if their
breakages overlap with other simpler tests, than developers will begin
to question their importance. Tests that take too long to run will be
looked into, and if they don't add much, they can be asked for
removal. That pressure is higher in the LIT side than in the
test-suite.

I'm trying to find a place where we can put the tests that will run at
the appropriate frequency and have the lowest probability of upsetting
CI and developers, so we can evolve them into what they *need* to be,
not cap it from the start and end up with something sub-par.
> Would it be possible to keep them in the monorepo but have bots that
> exercise those tests at the test-suite frequency?  I suspect that if e2e
> tests live in test-suite very few people will ever run them before
> merging to master.
Bots are pretty dumb: either they run something or they don't.

But more importantly, if we split the e2e tests in LIT, then people
won't run them before merging to master anyway.

Truth is, we don't *need* to. That's the whole point of having a fast
CI and the community really respects that.

As long as we have the tests running every few dozen commits, and bot
owner and developers work to fix them in time, we're good.

Furthermore, the test-suite already has e2e tests in there, so it is
the right place to add more. We can have more control of which tools
and libraries to use, how to check for quality, etc.
> I still think the kinds of e2e tests I'm thinking of are much closer to
> the existing LIT tests in the monorepo than things in test-suite.  I
> expect them to be quite small.
Adding tests to LIT means all fast bots will be slower. Adding them to
the test-suite means all test-suite bots will still take the about
same time.

If the tests only need to be ran once ever few dozen commits, then
having them on LIT is clearly not the right place.
> They wouldn't necessarily need to run as
> part of check-all (and indeed, I've been told that no one runs
check-all
> anyway because it's too fragile).
check-all doesn't need to check everything that is in the repo, but
everything that is built.

So if you build llvm+clang, then you should *definitely* check both.
"make check" doesn't do that.

With the monorepo this may change slightly, but we still need a way to
test everything that our patches touch, including clang, rt, and
others.

I always ran check-all before every patch, FWIW.

Robinson, Paul via llvm-dev

2019-Oct-15 14:55 UTC

head link

[llvm-dev] [cfe-dev] RFC: End-to-end testing

> -----Original Message-----
> From: cfe-dev <cfe-dev-bounces at lists.llvm.org> On Behalf Of Renato
Golin
> via cfe-dev
> Sent: Friday, October 11, 2019 11:24 AM
> To: David Greene <dag at cray.com>
> Cc: llvm-dev at lists.llvm.org; cfe-dev at lists.llvm.org; Gerolf Hoflehner
> <ghoflehner at apple.com>; openmp-dev at lists.llvm.org; lldb-dev at
lists.llvm.org
> Subject: Re: [cfe-dev] [llvm-dev] RFC: End-to-end testing
> 
> Hi David,
> 
> You answer some of your own questions down below, so I'll try to
> collate the responses and shorten my reply.
> 
> On Fri, 11 Oct 2019 at 15:20, David Greene <dag at cray.com> wrote:
> > How are regressions reported?  On llvm-dev?
> 
> They're buildbots, exactly like any other. Direct email, llvm-commits,
> irc, bugzilla. There is no distinction, broken bots need to be fixed.
> 
> llvm-dev is not the place to report bugs.
> 
> > I'm confused.  On the one hand you say you don't want to put
e2e tests
> > in a dark corner, but here you speculate they could be easily removed.
> > Presumably a test was added because there was some failure that other
> > tests did not catch.  It's true that once a test is fixed it's
very
> > likely it will never break again.  Is that a reason to remove tests?
> 
> Sorry, my point is about the dynamics between number of tests, their
> coverage, time to run, frequency of *unrelated* breakage, etc.
> 
> There are no set rules, but there is a back-pressure as developers and
> bot owners tend to breakages.
> 
> > What do you mean by "annoy?"  Taking too long to run?
> 
> Tests that break more often are looked at more often, and if their
> breakages overlap with other simpler tests, than developers will begin
> to question their importance. Tests that take too long to run will be
> looked into, and if they don't add much, they can be asked for
> removal. That pressure is higher in the LIT side than in the
> test-suite.
> 
> I'm trying to find a place where we can put the tests that will run at
> the appropriate frequency and have the lowest probability of upsetting
> CI and developers, so we can evolve them into what they *need* to be,
> not cap it from the start and end up with something sub-par.
> 
> > Would it be possible to keep them in the monorepo but have bots that
> > exercise those tests at the test-suite frequency?  I suspect that if
e2e
> > tests live in test-suite very few people will ever run them before
> > merging to master.
> 
> Bots are pretty dumb: either they run something or they don't.
> 
> But more importantly, if we split the e2e tests in LIT, then people
> won't run them before merging to master anyway.
Depends on whether they are part of check-all.
> Truth is, we don't *need* to. That's the whole point of having a
fast
> CI and the community really respects that.
> 
> As long as we have the tests running every few dozen commits, and bot
> owner and developers work to fix them in time, we're good.
> 
> Furthermore, the test-suite already has e2e tests in there, so it is
> the right place to add more. We can have more control of which tools
> and libraries to use, how to check for quality, etc.
My understanding is that test-suite had large-ish executable tests.
David is talking about small compile-only e2e tests.  These would hardly
take any more time than any other existing lit test.
> > I still think the kinds of e2e tests I'm thinking of are much
closer to
> > the existing LIT tests in the monorepo than things in test-suite.  I
> > expect them to be quite small.
> 
> Adding tests to LIT means all fast bots will be slower. Adding them to
> the test-suite means all test-suite bots will still take the about
> same time.
> 
> If the tests only need to be ran once ever few dozen commits, then
> having them on LIT is clearly not the right place.
The lit-versus-test-suite distinction is not the right one.  Bots don't
run "lit tests" as one big lump; they run the tests for a configured
set
of projects.  If the e2e tests are in with all the other clang tests, 
then they get run by the clang bots.  If they are in a different project 
(test-suite or their own) then they get run by the bots that run that 
project.  This is decided by the bot owner.
> 
> > They wouldn't necessarily need to run as
> > part of check-all (and indeed, I've been told that no one runs
check-all
> > anyway because it's too fragile).
> 
> check-all doesn't need to check everything that is in the repo, but
> everything that is built.
> 
> So if you build llvm+clang, then you should *definitely* check both.
> "make check" doesn't do that.
> 
> With the monorepo this may change slightly, but we still need a way to
> test everything that our patches touch, including clang, rt, and
> others.
> 
> I always ran check-all before every patch, FWIW.
Yep.  Although I run check-all before *starting* on a patch, to make sure
the starting point is clean.  It usually is, but I've been caught enough
times to be slightly wary.
--paulr
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Oct 2019 - [cfe-dev] RFC: End-to-end testing

[llvm-dev] [cfe-dev] RFC: End-to-end testing

[llvm-dev] [cfe-dev] RFC: End-to-end testing

[llvm-dev] [cfe-dev] RFC: End-to-end testing

Apparently Analagous Threads