David Greene via llvm-dev
2019-Oct-10 21:26 UTC
[llvm-dev] [cfe-dev] RFC: End-to-end testing
Renato Golin via cfe-dev <cfe-dev at lists.llvm.org> writes:> I'd recommend trying to move any e2e tests into the test-suite and > make it easier to run, and leave specific tests only in the repo (to > guarantee independence of components).That would be a shame. Where is test-suite run right now? Are there bots? How are regressions reported?> The last thing we want is to create direct paths from front-ends to > back-ends and make LLVM IR transformation less flexible.I'm not sure I follow. Can you explain this a bit? -David
Renato Golin via llvm-dev
2019-Oct-10 22:29 UTC
[llvm-dev] [cfe-dev] RFC: End-to-end testing
On Thu, 10 Oct 2019 at 22:26, David Greene <dag at cray.com> wrote:> That would be a shame. Where is test-suite run right now? Are there > bots? How are regressions reported?There is no shame in making the test-suite better. We do have bots running them in full CI for multiple targets, yes. Regressions are reported and fixed. The benchmarks are also followed by a smaller crowd and regression on those are also fixed (but slower). I'm not proposing to move e2e off to a dark corner, I'm proposing to have a scaled testing strategy that can ramp up and down as needed, without upsetting the delicate CI and developer balance. Sure, e2e tests are important, but they need to catch bugs that the other tests don't catch, not being our front-line safety net. We planned doing incremental testing with buildbots for years and Apple has done something like that in their GreenBots. We have talked to move that upstream, but time spent in testing is really really scant. A few years back there was a big effort to clean up the LIT tests from duplicates and speed up inefficient code, and a lot of tests are removed. If we just add the e2e today and they never catch anything relevant, they'll be the next candidates to go. The delta that e2e can test is really important, but really small and fairly rare. So running it less frequent (every few dozen commits) will most likely be enough for anything we can possibly respond to upstream. My main point is that we need to be realistic with what we can do upstream, which is very different from which a big company can do downstream. Past experiences have, over and over, shown us that new shiny CI toys get rusty, noisy, and dumped. We want to have the tests, in a place anyone can test, that the bots *will* test periodically, and that don't annoy developers often enough to be a target. In a nutshell: * We still need src2src tests, to ensure connection points (mainly IR) are canonical and generic, avoiding hidden contracts * We want the end2end tests to *add* coverage, not overlap with or replace existing tests * We don't want those tests to become a burden to developers by breaking on unrelated changes and making bots red for obscure reasons * We don't want them to be a burden to our CI efforts, slowing down regular LIT testing and becoming a target for removal The orders of magnitude for number of commits we want to run tests are: * LIT base, linker, compiler-RT, etc: ~1 * Test-suite correctness, end-2-end: ~10 * Multi-stage build, benchmarks: ~100 We already have that ratio (somewhat) with buildbots, so it should be simple to add e2e to the test suite at the right scale.> > The last thing we want is to create direct paths from front-ends to > > back-ends and make LLVM IR transformation less flexible. > > I'm not sure I follow. Can you explain this a bit?Right, I had written a long paragraph about it but deleted in the final version of my email. :) The main point is that we want to avoid hidden contracts between the front-end and the back-end. We want to make sure all front-ends can produce canonical IR, and that the middle-end can optimise the IR and that the back-end can lower that to asm in a way that runs correctly on the target. As we have multiple back-ends and are soon to have a second official front-end, we want to make sure we have good coverage on the multi-step tests (AST to IR, IR to asm, etc). If we add e2e tests that are not covered by piece-wise tests, we risk losing that clarity. I think e2e tests have to expose more complex issues, like front-end changes, pass manager order, optimisation levels, linking issues, etc. They can check for asm, run on the target, or both. In the test-suite we have more budget to do a more complete job at it than in LIT check-all. Hope this helps. cheers, --renato
David Greene via llvm-dev
2019-Oct-11 14:20 UTC
[llvm-dev] [cfe-dev] RFC: End-to-end testing
Renato Golin via cfe-dev <cfe-dev at lists.llvm.org> writes:> On Thu, 10 Oct 2019 at 22:26, David Greene <dag at cray.com> wrote: >> That would be a shame. Where is test-suite run right now? Are there >> bots? How are regressions reported? > > There is no shame in making the test-suite better.That's not what I meant, sorry. I mean it would be a shame not to be able to put end-to-end tests next to the code they test. Tests that are separated from code either tend to not get written/committed or tend to not get run pre-merge.> We do have bots running them in full CI for multiple targets, yes. > Regressions are reported and fixed. The benchmarks are also followed > by a smaller crowd and regression on those are also fixed (but > slower).How are regressions reported? On llvm-dev?> I'm not proposing to move e2e off to a dark corner, I'm proposing to > have a scaled testing strategy that can ramp up and down as needed, > without upsetting the delicate CI and developer balance.I don't think I quiet understand what you mean. CI is part of development. If tests break we have to fix them, regardless of where the tests live.> Sure, e2e tests are important, but they need to catch bugs that the > other tests don't catch, not being our front-line safety net.Of course.> A few years back there was a big effort to clean up the LIT tests from > duplicates and speed up inefficient code, and a lot of tests are > removed. If we just add the e2e today and they never catch anything > relevant, they'll be the next candidates to go.I'm confused. On the one hand you say you don't want to put e2e tests in a dark corner, but here you speculate they could be easily removed. Presumably a test was added because there was some failure that other tests did not catch. It's true that once a test is fixed it's very likely it will never break again. Is that a reason to remove tests? If something end-to-end breaks in the field, it would be great if we could capture a component-level test for it. That would be my first goal. I agree it can be tempting to stop at an e2e test level and not investigate further down. We definitely want to avoid that. My guess is that over time we'll gain experience of what a good e2e test is for the LLVM project.> The delta that e2e can test is really important, but really small and > fairly rare. So running it less frequent (every few dozen commits) > will most likely be enough for anything we can possibly respond to > upstream.I think that's probably reasonable.> Past experiences have, over and over, shown us that new shiny CI toys > get rusty, noisy, and dumped.I don't think e2e testing is shiny or new. :)> We want to have the tests, in a place anyone can test, that the bots > *will* test periodically, and that don't annoy developers often enough > to be a target.What do you mean by "annoy?" Taking too long to run?> In a nutshell: > * We still need src2src tests, to ensure connection points (mainly > IR) are canonical and generic, avoiding hidden contractsYes.> * We want the end2end tests to *add* coverage, not overlap with or > replace existing testsYes, but I suspect people will disagree about what constitutes "overlap."> * We don't want those tests to become a burden to developers by > breaking on unrelated changes and making bots red for obscure reasonsWell, tests are going to break. If a test is too fragile it should be fixed or removed.> * We don't want them to be a burden to our CI efforts, slowing down > regular LIT testing and becoming a target for removalI certainly think less frequent running of tests could help with that if it proves to be a burden.> The orders of magnitude for number of commits we want to run tests are: > * LIT base, linker, compiler-RT, etc: ~1 > * Test-suite correctness, end-2-end: ~10 > * Multi-stage build, benchmarks: ~100 > > We already have that ratio (somewhat) with buildbots, so it should be > simple to add e2e to the test suite at the right scale.Would it be possible to keep them in the monorepo but have bots that exercise those tests at the test-suite frequency? I suspect that if e2e tests live in test-suite very few people will ever run them before merging to master.>> > The last thing we want is to create direct paths from front-ends to >> > back-ends and make LLVM IR transformation less flexible. >> >> I'm not sure I follow. Can you explain this a bit? > > Right, I had written a long paragraph about it but deleted in the > final version of my email. :) > > The main point is that we want to avoid hidden contracts between the > front-end and the back-end. > > We want to make sure all front-ends can produce canonical IR, and that > the middle-end can optimise the IR and that the back-end can lower > that to asm in a way that runs correctly on the target. As we have > multiple back-ends and are soon to have a second official front-end, > we want to make sure we have good coverage on the multi-step tests > (AST to IR, IR to asm, etc).Absolutely.> If we add e2e tests that are not covered by piece-wise tests, we risk > losing that clarity.Gotcha.> I think e2e tests have to expose more complex issues, like front-end > changes, pass manager order, optimisation levels, linking issues, etc. > They can check for asm, run on the target, or both. In the test-suite > we have more budget to do a more complete job at it than in LIT > check-all.Thanks for the explanation, that helped clarify things for me. I still think the kinds of e2e tests I'm thinking of are much closer to the existing LIT tests in the monorepo than things in test-suite. I expect them to be quite small. They wouldn't necessarily need to run as part of check-all (and indeed, I've been told that no one runs check-all anyway because it's too fragile). -David
Philip Reames via llvm-dev
2019-Oct-13 22:59 UTC
[llvm-dev] [cfe-dev] RFC: End-to-end testing
+1 to the points made here. Renato very nicely explained the tradeoffs involved. On 10/10/19 3:29 PM, Renato Golin via llvm-dev wrote:> On Thu, 10 Oct 2019 at 22:26, David Greene <dag at cray.com> wrote: >> That would be a shame. Where is test-suite run right now? Are there >> bots? How are regressions reported? > There is no shame in making the test-suite better. > > We do have bots running them in full CI for multiple targets, yes. > Regressions are reported and fixed. The benchmarks are also followed > by a smaller crowd and regression on those are also fixed (but > slower). > > I'm not proposing to move e2e off to a dark corner, I'm proposing to > have a scaled testing strategy that can ramp up and down as needed, > without upsetting the delicate CI and developer balance. > > Sure, e2e tests are important, but they need to catch bugs that the > other tests don't catch, not being our front-line safety net. > > We planned doing incremental testing with buildbots for years and > Apple has done something like that in their GreenBots. We have talked > to move that upstream, but time spent in testing is really really > scant. > > A few years back there was a big effort to clean up the LIT tests from > duplicates and speed up inefficient code, and a lot of tests are > removed. If we just add the e2e today and they never catch anything > relevant, they'll be the next candidates to go. > > The delta that e2e can test is really important, but really small and > fairly rare. So running it less frequent (every few dozen commits) > will most likely be enough for anything we can possibly respond to > upstream. > > My main point is that we need to be realistic with what we can do > upstream, which is very different from which a big company can do > downstream. > > Past experiences have, over and over, shown us that new shiny CI toys > get rusty, noisy, and dumped. > > We want to have the tests, in a place anyone can test, that the bots > *will* test periodically, and that don't annoy developers often enough > to be a target. > > In a nutshell: > * We still need src2src tests, to ensure connection points (mainly > IR) are canonical and generic, avoiding hidden contracts > * We want the end2end tests to *add* coverage, not overlap with or > replace existing tests > * We don't want those tests to become a burden to developers by > breaking on unrelated changes and making bots red for obscure reasons > * We don't want them to be a burden to our CI efforts, slowing down > regular LIT testing and becoming a target for removal > > The orders of magnitude for number of commits we want to run tests are: > * LIT base, linker, compiler-RT, etc: ~1 > * Test-suite correctness, end-2-end: ~10 > * Multi-stage build, benchmarks: ~100 > > We already have that ratio (somewhat) with buildbots, so it should be > simple to add e2e to the test suite at the right scale. > >>> The last thing we want is to create direct paths from front-ends to >>> back-ends and make LLVM IR transformation less flexible. >> I'm not sure I follow. Can you explain this a bit? > Right, I had written a long paragraph about it but deleted in the > final version of my email. :) > > The main point is that we want to avoid hidden contracts between the > front-end and the back-end. > > We want to make sure all front-ends can produce canonical IR, and that > the middle-end can optimise the IR and that the back-end can lower > that to asm in a way that runs correctly on the target. As we have > multiple back-ends and are soon to have a second official front-end, > we want to make sure we have good coverage on the multi-step tests > (AST to IR, IR to asm, etc). > > If we add e2e tests that are not covered by piece-wise tests, we risk > losing that clarity. > > I think e2e tests have to expose more complex issues, like front-end > changes, pass manager order, optimisation levels, linking issues, etc. > They can check for asm, run on the target, or both. In the test-suite > we have more budget to do a more complete job at it than in LIT > check-all. > > Hope this helps. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev