Michael Kruse via llvm-dev
2020-Jun-24  15:03 UTC
[llvm-dev] [RFC] Compiled regression tests.
Am Mi., 24. Juni 2020 um 00:37 Uhr schrieb David Blaikie <dblaikie at gmail.com>:> I'm pretty change averse - and am in this case (but doesn't mean other folks won't be in favor and doesn't mean it isn't worth trying, etc - but if it were up to me, at the moment, I'd decline)That's understandable. New features also come with a cost that they need to recoup with their usefulness.> To take one point from the phab review (to try to keep the discussion here rather than there - might be worth a note on the phab review to discourage commenting there so the discussion doesn't get spread through different threads):I added a remark to the Phabrictor thread. (I kept these arguments there to keep the RFC itself short)> > Because of all of the above, maintenance of all the regression tests is a nightmare. I expect it to be a serious issue for introducing opaque pointers. My prediction is that we will have a typed-pointer command line flag to not have to require updating all the write-only regression tests. > > Actually I already did a lot of work with the initial patches years ago for opaque pointers (that added explicit types to gep/store/etc) and used (& provided in the commit messages) python scripts to migrate all the tests, both the IR itself and the CHECK text. This is probably more readily automatable than a more free-form C++ based checking proposed here. > > That said, it sounds like the proposal is a lot closer to the GUnit tests, and if this testing strategy is valuable, it seems like it could be mostly achieved by adding some API utilities (like the one you proposed in the patch) to make it more convenient to run optimization passes in a GUnit test. It doesn't seem to me like an #ifdef based approach to embedding IR in C++ would result in particularly more manageable/formattable code than a raw string. Perhaps the proposed improvements could be used to reduce/remove the cost of adding new GUnit tests/the need to touch CMake files/etc. (though I'd worry about the divergence of where optimization tests are written - as unit tests or as lit/FileCheck tests - that doesn't mean experimentation isn't worthwhile, but I think it'd require a pretty compelling justification to propose a replacement to the FileCheck approach (& perhaps a timeline for an experiment before either removing it, or deciding that it is the intended future state) - and if it's not a replacement, then I think we'd need to discuss what sort of situations this new thing is suitable and what FileCheck should be used for going forward)As mentioned in the Differential, generating the tests automatically will lose information about what actually is intended to be tested, making it harder to understand the test. If the had a method to just update all tests (Polly as `ninja polly-update-format` to automatically update `ninja polly-check-format` failures), updating is easier, but because of the lack of understanding in practice most changes will just be glossed over. We already have a split between unittests (eg. llvm/Transforms/Scalar/LICMTests.cpp) and regression tests (llvm/test/Transforms/LICM/*.ll). New tests in D82426 are located next to the .ll files. The unittests could be moved there too. Michael
David Blaikie via llvm-dev
2020-Jun-24  15:12 UTC
[llvm-dev] [RFC] Compiled regression tests.
On Wed, Jun 24, 2020 at 8:04 AM Michael Kruse <llvmdev at meinersbur.de> wrote:> > Am Mi., 24. Juni 2020 um 00:37 Uhr schrieb David Blaikie <dblaikie at gmail.com>: > > I'm pretty change averse - and am in this case (but doesn't mean other folks won't be in favor and doesn't mean it isn't worth trying, etc - but if it were up to me, at the moment, I'd decline) > > That's understandable. New features also come with a cost that they > need to recoup with their usefulness. > > > > To take one point from the phab review (to try to keep the discussion here rather than there - might be worth a note on the phab review to discourage commenting there so the discussion doesn't get spread through different threads): > > I added a remark to the Phabrictor thread. > (I kept these arguments there to keep the RFC itself short) > > > > > Because of all of the above, maintenance of all the regression tests is a nightmare. I expect it to be a serious issue for introducing opaque pointers. My prediction is that we will have a typed-pointer command line flag to not have to require updating all the write-only regression tests. > > > > Actually I already did a lot of work with the initial patches years ago for opaque pointers (that added explicit types to gep/store/etc) and used (& provided in the commit messages) python scripts to migrate all the tests, both the IR itself and the CHECK text. This is probably more readily automatable than a more free-form C++ based checking proposed here. > > > > That said, it sounds like the proposal is a lot closer to the GUnit tests, and if this testing strategy is valuable, it seems like it could be mostly achieved by adding some API utilities (like the one you proposed in the patch) to make it more convenient to run optimization passes in a GUnit test. It doesn't seem to me like an #ifdef based approach to embedding IR in C++ would result in particularly more manageable/formattable code than a raw string. Perhaps the proposed improvements could be used to reduce/remove the cost of adding new GUnit tests/the need to touch CMake files/etc. (though I'd worry about the divergence of where optimization tests are written - as unit tests or as lit/FileCheck tests - that doesn't mean experimentation isn't worthwhile, but I think it'd require a pretty compelling justification to propose a replacement to the FileCheck approach (& perhaps a timeline for an experiment before either removing it, or deciding that it is the intended future state) - and if it's not a replacement, then I think we'd need to discuss what sort of situations this new thing is suitable and what FileCheck should be used for going forward) > > As mentioned in the Differential, generating the tests automatically > will lose information about what actually is intended to be tested,Agreed - and I didn't mean to suggest tests should be automatically generated. I work pretty hard in code reviews to encourage tests to be written as stable-y as possible so they are resilient to unrelated changes. The python scripts I wrote to update tests didn't require tests to be written in an automated fashion but were still able to migrate the significant majority of hand-written FileCheck test cases.> making it harder to understand the test. > If the had a method to just update all tests (Polly as `ninja > polly-update-format` to automatically update `ninja > polly-check-format` failures), updating is easier, but because of the > lack of understanding in practice most changes will just be glossed > over. > > We already have a split between unittests (eg. > llvm/Transforms/Scalar/LICMTests.cpp) and regression tests > (llvm/test/Transforms/LICM/*.ll). New tests in D82426 are located next > to the .ll files. The unittests could be moved there too.Yep, if there's a nice way to change lit/cmake so that unit tests can live alongside the FileCheck tests, I think that might be an improvement. Though for now I'd only expect to find a unit test for something when it's not reasonably FileCheck testable, in this case it looks like the LICMTest.cpp is testing side effects on analyses when running LICM, which makes some sense - analyses have no textual representation, so can't be readily FileCheck tested, only their side effects can. - Dave
Michael Kruse via llvm-dev
2020-Jun-24  16:43 UTC
[llvm-dev] [RFC] Compiled regression tests.
Am Mi., 24. Juni 2020 um 10:12 Uhr schrieb David Blaikie <dblaikie at gmail.com>:> > As mentioned in the Differential, generating the tests automatically > > will lose information about what actually is intended to be tested, > > Agreed - and I didn't mean to suggest tests should be automatically > generated. I work pretty hard in code reviews to encourage tests to be > written as stable-y as possible so they are resilient to unrelated > changes. The python scripts I wrote to update tests didn't require > tests to be written in an automated fashion but were still able to > migrate the significant majority of hand-written FileCheck test cases.My argument is that it is hard to impossible to really only test the relevant bits using FileCheck. CHECK-DAG, named regexes etc are mechanisms making this possible, but at the same time make the verification more complicated. I don't think it is feasible to write single-purpose update scripts for most changes. Even if there is one, it is even less feasible to ensure for all tests that they still test what was originally intended, especially with CHECK-NOT. I had to put a lot of effort into updating loop metadata tests. Because metadata nodes have sequential numbers in order the IR emitter decides to put them, it is tempting to use the form ![[NODENAME:.*]] for each occurance so you can reorder the lines in the order they occur, as indeed I found in many regression tests. Three problems with this: 1. When the regex is specified, it will overwrite the content of previous placeholders, i.e. if used everywhere, it is equivalent to {{.*}} 2. Using a backtracking regex engine, there are inputs with exponential time behaviour.having mistake 3. It will match more than needed, e.g. a list of nodes {![[NODENAME:.*]]} will also match !{!0, !1} and FileCheck will complain somewhere else that !0 = {!0, !1} does not match ![[NODENAME]] = {![[NODENAME]], ![[LOOP1:.*]]} (if not the 'regex everywhere' mistake was made) A "robust" test would use [[NODENAME:[0-9]+]] and CHECK-DAG, as some tests do, but also making the CHECK lines even longer and more cluttered. In contrast to instructions, not all metadata lines have recognizable keywords that could indicate what it might intend to match. CHECK-DAG will happily match with any metadata node that has the same number of operands, deferring a mismatch report to some later CHECK line, but due to the DAG part continuing to match with previous lines. Unfortunately, most tests we have do not even use placeholders for metadata nodes, including those generated by update_test_checks.py. I looked into improving the script in this regard, but its implementation is function-centric, making it difficult to add module-level placeholders. As a result, I estimate that I had to invest about twice the time to update/fix tests than writing the code changes themselves. Due to my experience, I find updating FileCheck tests very frustrating. I'd prefer not to test whether specific metadata nodes are present, but whether the LLVM API to query them (such as `isAnnotatedParallel`) returns the expected result. Michael