[ I am initially copying only a few lists since they seem like the most impacted projects and I didn't want to spam all the mailing lists. Please let me know if other lists should be included. ] I submitted D68230 for review but this is not about that patch per se. The patch allows update_cc_test_checks.py to process tests that should check target asm rather than LLVM IR. We use this facility downstream for our end-to-end tests. It strikes me that it might be useful for upstream to do similar end-to-end testing. Now that the monorepo is about to become the canonical source of truth, we have an opportunity for convenient end-to-end testing that we didn't easily have before with svn (yes, it could be done but in an ugly way). AFAIK the only upstream end-to-end testing we have is in test-suite and many of those codes are very large and/or unfocused tests. With the monorepo we have a place to put lit-style tests that exercise multiple subprojects, for example tests that ensure the entire clang compilation pipeline executes correctly. We could, for example, create a top-level "test" directory and put end-to-end tests there. Some of the things that could be tested include: - Pipeline execution (debug-pass=Executions) - Optimization warnings/messages - Specific asm code sequences out of clang (e.g. ensure certain loops are vectorized) - Pragma effects (e.g. ensure loop optimizations are honored) - Complete end-to-end PGO (generate a profile and re-compile) - GPU/accelerator offloading - Debuggability of clang-generated code Each of these things is tested to some degree within their own subprojects, but AFAIK there are currently no dedicated tests ensuring such things work through the entire clang pipeline flow and with other tools that make use of the results (debuggers, etc.). It is relatively easy to break the pipeline while the individual subproject tests continue to pass. I realize that some folks prefer to work on only a portion of the monorepo (for example, they just hack on LLVM). I am not sure how to address those developers WRT end-to-end testing. On the one hand, requiring them to run end-to-end testing means they will have to at least check out and build the monorepo. On the other hand, it seems less than ideal to have people developing core infrastructure and not running tests. I don't yet have a formal proposal but wanted to put this out to spur discussion and gather feedback and ideas. Thank you for your interest and participation! -David
> -----Original Message----- > From: cfe-dev <cfe-dev-bounces at lists.llvm.org> On Behalf Of David Greene > via cfe-dev > Sent: Tuesday, October 08, 2019 12:50 PM > To: llvm-dev at lists.llvm.org; cfe-dev at lists.llvm.org; openmp- > dev at lists.llvm.org; lldb-dev at lists.llvm.org > Subject: [cfe-dev] RFC: End-to-end testing > > [ I am initially copying only a few lists since they seem like > the most impacted projects and I didn't want to spam all the mailing > lists. Please let me know if other lists should be included. ] > > I submitted D68230 for review but this is not about that patch per se. > The patch allows update_cc_test_checks.py to process tests that should > check target asm rather than LLVM IR. We use this facility downstream > for our end-to-end tests. It strikes me that it might be useful for > upstream to do similar end-to-end testing. > > Now that the monorepo is about to become the canonical source of truth, > we have an opportunity for convenient end-to-end testing that we didn't > easily have before with svn (yes, it could be done but in an ugly way). > AFAIK the only upstream end-to-end testing we have is in test-suite and > many of those codes are very large and/or unfocused tests. > > With the monorepo we have a place to put lit-style tests that exercise > multiple subprojects, for example tests that ensure the entire clang > compilation pipeline executes correctly. We could, for example, create > a top-level "test" directory and put end-to-end tests there. Some of > the things that could be tested include: > > - Pipeline execution (debug-pass=Executions) > - Optimization warnings/messages > - Specific asm code sequences out of clang (e.g. ensure certain loops > are vectorized) > - Pragma effects (e.g. ensure loop optimizations are honored) > - Complete end-to-end PGO (generate a profile and re-compile) > - GPU/accelerator offloading > - Debuggability of clang-generated code > > Each of these things is tested to some degree within their own > subprojects, but AFAIK there are currently no dedicated tests ensuring > such things work through the entire clang pipeline flow and with other > tools that make use of the results (debuggers, etc.). It is relatively > easy to break the pipeline while the individual subproject tests > continue to pass.I agree with all your points. End-to-end testing is a major hole in the project infrastructure; it has been largely left up to the individual vendors/packagers/distributors. The Clang tests verify that Clang will produce some sort of not-unreasonable IR for given situations; the LLVM tests verify that some (other) set of input IR will produce something that looks not-unreasonable on the target side. Very little connects the two. There is more than nothing: - test-suite has some quantity of code that is compiled end-to-end for some targets. - lldb can be set up to use the just-built Clang to compile its tests, but those are focused on debug info and are nothing like comprehensive. - libcxx likely also can use the just-built Clang, although I've never tried it so I don't know for sure. It obviously exercises just the runtime side of things. - compiler-rt likewise. The sanitizer tests in particular I'd expect to be using the just-built Clang. - debuginfo-tests also uses the just-built Clang but is a pathetically small set, and again focused on debug info. I'm not saying the LLVM Project should invest in a commercial suite (although I'd expect vendors to do so; Sony does). But a place to *put* end-to-end tests seems entirely reasonable and useful. Although I would resist calling it simply "tests" (we have too many directories with that name already).> > I realize that some folks prefer to work on only a portion of the > monorepo (for example, they just hack on LLVM). I am not sure how to > address those developers WRT end-to-end testing. On the one hand, > requiring them to run end-to-end testing means they will have to at > least check out and build the monorepo. On the other hand, it seems > less than ideal to have people developing core infrastructure and not > running tests.People should obviously be running the tests for the project(s) they're modifying. People aren't expected to run everything. That's why... Bots. "Don't argue with the bots." I don't checkout and build and test everything, and I've broken LLDB, compiler-rt, and probably others from time to time. Probably everybody has broken other projects unexpectedly. That's what bots are for, to run the combinations and the projects that I don't have the infrastructure or resources to do myself. It's not up to me to run everything possible before committing; it IS up to me to respond promptly to bot failures for my changes. I don't see a new end-to-end test project being any different in that respect.> > I don't yet have a formal proposal but wanted to put this out to spur > discussion and gather feedback and ideas. Thank you for your interest > and participation!Thanks for bringing it up! It's been a pebble in my shoe for a long time. --paulr> > -David > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
David Blaikie via llvm-dev
2019-Oct-08 19:22 UTC
[llvm-dev] [cfe-dev] RFC: End-to-end testing
I have a bit of concern about this sort of thing - worrying it'll lead to people being less cautious about writing the more isolated tests. That said, clearly there's value in end-to-end testing for all the reasons you've mentioned (& we do see these problems in practice - recently DWARF indexing broke when support for more nuanced language codes were added to Clang). Dunno if they need a new place or should just be more stuff in test-suite, though. On Tue, Oct 8, 2019 at 9:50 AM David Greene via cfe-dev < cfe-dev at lists.llvm.org> wrote:> [ I am initially copying only a few lists since they seem like > the most impacted projects and I didn't want to spam all the mailing > lists. Please let me know if other lists should be included. ] > > I submitted D68230 for review but this is not about that patch per se. > The patch allows update_cc_test_checks.py to process tests that should > check target asm rather than LLVM IR. We use this facility downstream > for our end-to-end tests. It strikes me that it might be useful for > upstream to do similar end-to-end testing. > > Now that the monorepo is about to become the canonical source of truth, > we have an opportunity for convenient end-to-end testing that we didn't > easily have before with svn (yes, it could be done but in an ugly way). > AFAIK the only upstream end-to-end testing we have is in test-suite and > many of those codes are very large and/or unfocused tests. > > With the monorepo we have a place to put lit-style tests that exercise > multiple subprojects, for example tests that ensure the entire clang > compilation pipeline executes correctly. We could, for example, create > a top-level "test" directory and put end-to-end tests there. Some of > the things that could be tested include: > > - Pipeline execution (debug-pass=Executions) > - Optimization warnings/messages > - Specific asm code sequences out of clang (e.g. ensure certain loops > are vectorized) > - Pragma effects (e.g. ensure loop optimizations are honored) > - Complete end-to-end PGO (generate a profile and re-compile) > - GPU/accelerator offloading > - Debuggability of clang-generated code > > Each of these things is tested to some degree within their own > subprojects, but AFAIK there are currently no dedicated tests ensuring > such things work through the entire clang pipeline flow and with other > tools that make use of the results (debuggers, etc.). It is relatively > easy to break the pipeline while the individual subproject tests > continue to pass. > > I realize that some folks prefer to work on only a portion of the > monorepo (for example, they just hack on LLVM). I am not sure how to > address those developers WRT end-to-end testing. On the one hand, > requiring them to run end-to-end testing means they will have to at > least check out and build the monorepo. On the other hand, it seems > less than ideal to have people developing core infrastructure and not > running tests. > > I don't yet have a formal proposal but wanted to put this out to spur > discussion and gather feedback and ideas. Thank you for your interest > and participation! > > -David > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191008/f4a36da1/attachment.html>
David Greene via llvm-dev
2019-Oct-08 19:46 UTC
[llvm-dev] [Openmp-dev] [cfe-dev] RFC: End-to-end testing
David Blaikie via Openmp-dev <openmp-dev at lists.llvm.org> writes:> I have a bit of concern about this sort of thing - worrying it'll lead to > people being less cautious about writing the more isolated tests.That's a fair concern. Reviewers will still need to insist on small component-level tests to go along with patches. We don't have to sacrifice one to get the other.> Dunno if they need a new place or should just be more stuff in test-suite, > though.There are at least two problems I see with using test-suite for this: - It is a separate repository and thus is not as convenient as tests that live with the code. One cannot commit an end-to-end test atomically with the change meant to be tested. - It is full of large codes which is not the kind of testing I'm talking about. Let me describe how I recently added some testing in our downstream fork. - I implemented a new feature along with a C source test. - I used clang to generate asm from that test and captured the small piece of it I wanted to check in an end-to-end test. - I used clang to generate IR just before the feature kicked in and created an opt-style test for it. Generating this IR is not always straightfoward and it would be great to have better tools to do this, but that's another discussion. - I took the IR out of opt (after running my feature) and created an llc-style test out of it to check the generated asm. The checks are the same as in the original C end-to-end test. So the tests are checking at each stage that the expected input is generating the expected output and the end-to-end test checks that we go from source to asm correctly. These are all really small tests, easily runnable as part of the normal "make check" process. -David
Mehdi AMINI via llvm-dev
2019-Oct-09 02:14 UTC
[llvm-dev] [cfe-dev] RFC: End-to-end testing
> I have a bit of concern about this sort of thing - worrying it'll lead to > people being less cautious about writing the more isolated tests. >I have the same concern. I really believe we need to be careful about testing at the right granularity to keep things both modular and the testing maintainable (for instance checking vectorized ASM from a C++ source through clang has always been considered a bad FileCheck practice). (Not saying that there is no space for better integration testing in some areas).> That said, clearly there's value in end-to-end testing for all the reasons > you've mentioned (& we do see these problems in practice - recently DWARF > indexing broke when support for more nuanced language codes were added to > Clang). > > Dunno if they need a new place or should just be more stuff in test-suite, > though. > > On Tue, Oct 8, 2019 at 9:50 AM David Greene via cfe-dev < > cfe-dev at lists.llvm.org> wrote: > >> [ I am initially copying only a few lists since they seem like >> the most impacted projects and I didn't want to spam all the mailing >> lists. Please let me know if other lists should be included. ] >> >> I submitted D68230 for review but this is not about that patch per se. >> The patch allows update_cc_test_checks.py to process tests that should >> check target asm rather than LLVM IR. We use this facility downstream >> for our end-to-end tests. It strikes me that it might be useful for >> upstream to do similar end-to-end testing. >> >> Now that the monorepo is about to become the canonical source of truth, >> we have an opportunity for convenient end-to-end testing that we didn't >> easily have before with svn (yes, it could be done but in an ugly way). >> AFAIK the only upstream end-to-end testing we have is in test-suite and >> many of those codes are very large and/or unfocused tests. >> >> With the monorepo we have a place to put lit-style tests that exercise >> multiple subprojects, for example tests that ensure the entire clang >> compilation pipeline executes correctly. > >I don't think I agree with the relationship to the monorepo: there was nothing that prevented tests inside the clang project to exercise the full pipeline already. I don't believe that the SVN repo structure was really a factor in the way the testing was setup, but instead it was a deliberate choice in the way we structure our testing. For instance I remember asking about implementing test based on checking if some loops written in C source file were properly vectorized by the -O2 / -O3 pipeline and it was deemed like the kind of test that we don't want to maintain: instead I was pointed at the test-suite to add better benchmarks there for the end-to-end story. What is interesting is that the test-suite is not gonna be part of the monorepo! To be clear: I'm not saying here we can't change our way of testing, I just don't think the monorepo has anything to do with it and that it should carefully motivated and scoped into what belongs/doesn't belong to integration tests.> We could, for example, create >> a top-level "test" directory and put end-to-end tests there. Some of >> the things that could be tested include: >> >> - Pipeline execution (debug-pass=Executions) >> > - Optimization warnings/messages >> - Specific asm code sequences out of clang (e.g. ensure certain loops >> are vectorized) >> - Pragma effects (e.g. ensure loop optimizations are honored) >> - Complete end-to-end PGO (generate a profile and re-compile) >> - GPU/accelerator offloading >> - Debuggability of clang-generated code >> >> Each of these things is tested to some degree within their own >> subprojects, but AFAIK there are currently no dedicated tests ensuring >> such things work through the entire clang pipeline flow and with other >> tools that make use of the results (debuggers, etc.). It is relatively >> easy to break the pipeline while the individual subproject tests >> continue to pass. >> >I'm not sure I really see much in your list that isn't purely about testing clang itself here? Actually the first one seems more of a pure LLVM test.> I realize that some folks prefer to work on only a portion of the >> monorepo (for example, they just hack on LLVM). I am not sure how to >> address those developers WRT end-to-end testing. On the one hand, >> requiring them to run end-to-end testing means they will have to at >> least check out and build the monorepo. On the other hand, it seems >> less than ideal to have people developing core infrastructure and not >> running tests. >> >I think we already expect LLVM developers to update clang APIs? And we revert LLVM patches when clang testing is broken. So I believe the acknowledgment to maintain the other in-tree projects isn't really new, it is true that the monorepo will make this easy for everyone to reproduce locally most failure, and find all the use of an API across projects (which was provided as a motivation to move to a monorepo model: https://llvm.org/docs/Proposals/GitHubMove.html#monorepo ). -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191008/55b4c71f/attachment.html>
On 10/8/19 9:49 AM, David Greene via llvm-dev wrote:> [ I am initially copying only a few lists since they seem like > the most impacted projects and I didn't want to spam all the mailing > lists. Please let me know if other lists should be included. ] > > I submitted D68230 for review but this is not about that patch per se. > The patch allows update_cc_test_checks.py to process tests that should > check target asm rather than LLVM IR. We use this facility downstream > for our end-to-end tests. It strikes me that it might be useful for > upstream to do similar end-to-end testing. > > Now that the monorepo is about to become the canonical source of truth, > we have an opportunity for convenient end-to-end testing that we didn't > easily have before with svn (yes, it could be done but in an ugly way). > AFAIK the only upstream end-to-end testing we have is in test-suite and > many of those codes are very large and/or unfocused tests. > > With the monorepo we have a place to put lit-style tests that exercise > multiple subprojects, for example tests that ensure the entire clang > compilation pipeline executes correctly. We could, for example, create > a top-level "test" directory and put end-to-end tests there. Some of > the things that could be tested include: > > - Pipeline execution (debug-pass=Executions) > - Optimization warnings/messages > - Specific asm code sequences out of clang (e.g. ensure certain loops > are vectorized) > - Pragma effects (e.g. ensure loop optimizations are honored) > - Complete end-to-end PGO (generate a profile and re-compile) > - GPU/accelerator offloading > - Debuggability of clang-generated code > > Each of these things is tested to some degree within their own > subprojects, but AFAIK there are currently no dedicated tests ensuring > such things work through the entire clang pipeline flow and with other > tools that make use of the results (debuggers, etc.). It is relatively > easy to break the pipeline while the individual subproject tests > continue to pass. > > I realize that some folks prefer to work on only a portion of the > monorepo (for example, they just hack on LLVM). I am not sure how to > address those developers WRT end-to-end testing. On the one hand, > requiring them to run end-to-end testing means they will have to at > least check out and build the monorepo. On the other hand, it seems > less than ideal to have people developing core infrastructure and not > running tests. > > I don't yet have a formal proposal but wanted to put this out to spur > discussion and gather feedback and ideas. Thank you for your interest > and participation!The two major concerns I see are a potential decay in component test quality, and an increase in difficulty changing components. The former has already been discussed a bit downstream, so let me focus on the later. A challenge we already have - as in, I've broken these tests and had to fix them - is that an end to end test which checks either IR or assembly ends up being extraordinarily fragile. Completely unrelated profitable transforms create small differences which cause spurious test failures. This is a very real issue today with the few end-to-end clang tests we have, and I am extremely hesitant to expand those tests without giving this workflow problem serious thought. If we don't, this could bring development on middle end transforms to a complete stop. (Not kidding.) A couple of approaches we could consider: 1. Simply restrict end to end tests to crash/assert cases. (i.e. no property of the generated code is checked, other than that it is generated) This isn't as restrictive as it sounds when combined w/coverage guided fuzzer corpuses. 2. Auto-update all diffs, but report them to a human user for inspection. This ends up meaning that tests never "fail" per se, but that individuals who have expressed interest in particular tests get an automated notification and a chance to respond on list with a reduced example. 3. As a variant on the former, don't auto-update tests, but only inform the *contributor* of an end-to-end test of a failure. Responsibility for determining failure vs false positive lies solely with them, and normal channels are used to report a failure after it has been confirmed/analyzed/explained. I really think this is a problem we need to have thought through and found a workable solution before end-to-end testing as proposed becomes a practically workable option. Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191009/e9ccf276/attachment.html>
David Greene via llvm-dev
2019-Oct-10 01:25 UTC
[llvm-dev] [cfe-dev] RFC: End-to-end testing
Philip Reames via cfe-dev <cfe-dev at lists.llvm.org> writes:> A challenge we already have - as in, I've broken these tests and had to > fix them - is that an end to end test which checks either IR or assembly > ends up being extraordinarily fragile. Completely unrelated profitable > transforms create small differences which cause spurious test failures. > This is a very real issue today with the few end-to-end clang tests we > have, and I am extremely hesitant to expand those tests without giving > this workflow problem serious thought. If we don't, this could bring > development on middle end transforms to a complete stop. (Not kidding.)Do you have a pointer to these tests? We literally have tens of thousands of end-to-end tests downstream and while some are fragile, the vast majority are not. A test that, for example, checks the entire generated asm for a match is indeed very fragile. A test that checks whether a specific instruction/mnemonic was emitted is generally not, at least in my experience. End-to-end tests require some care in construction. I don't think update_llc_test_checks.py-type operation is desirable. Still, you raise a valid point and I think present some good options below.> A couple of approaches we could consider: > > 1. Simply restrict end to end tests to crash/assert cases. (i.e. no > property of the generated code is checked, other than that it is > generated) This isn't as restrictive as it sounds when combined > w/coverage guided fuzzer corpuses.I would be pretty hesitant to do this but I'd like to hear more about how you see this working with coverage/fuzzing.> 2. Auto-update all diffs, but report them to a human user for > inspection. This ends up meaning that tests never "fail" per se, > but that individuals who have expressed interest in particular tests > get an automated notification and a chance to respond on list with a > reduced example.That's certainly workable.> 3. As a variant on the former, don't auto-update tests, but only inform > the *contributor* of an end-to-end test of a failure. Responsibility > for determining failure vs false positive lies solely with them, and > normal channels are used to report a failure after it has been > confirmed/analyzed/explained.I think I like this best of the three but it raises the question of what happens when the contributor is no longer contributing. Who's responsible for the test? Maybe it just sits there until someone else claims it.> I really think this is a problem we need to have thought through and > found a workable solution before end-to-end testing as proposed becomes > a practically workable option.Noted. I'm very happy to have this discussion and work the problem. -David