thr3ads.net - llvm dev - [llvm-dev] [RFC] Compiled regression tests. [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2020-Jun-24 15:03 UTC

[llvm-dev] [RFC] Compiled regression tests.

Am Mi., 24. Juni 2020 um 00:37 Uhr schrieb David Blaikie <dblaikie at
gmail.com>:> I'm pretty change averse - and am in this case (but doesn't mean
other folks won't be in favor and doesn't mean it isn't worth
trying, etc - but if it were up to me, at the moment, I'd decline)
That's understandable. New features also come with a cost that they
need to recoup with their usefulness.

> To take one point from the phab review (to try to keep the discussion here
rather than there - might be worth a note on the phab review to discourage
commenting there so the discussion doesn't get spread through different
threads):
I added a remark to the Phabrictor thread.
(I kept these arguments there to keep the RFC itself short)

> > Because of all of the above, maintenance of all the regression tests
is a nightmare. I expect it to be a serious issue for introducing opaque
pointers. My prediction is that we will have a typed-pointer command line flag
to not have to require updating all the write-only regression tests.
>
> Actually I already did a lot of work with the initial patches years ago for
opaque pointers (that added explicit types to gep/store/etc) and used (&
provided in the commit messages) python scripts to migrate all the tests, both
the IR itself and the CHECK text. This is probably more readily automatable than
a more free-form C++ based checking proposed here.
>
> That said, it sounds like the proposal is a lot closer to the GUnit tests,
and if this testing strategy is valuable, it seems like it could be mostly
achieved by adding some API utilities (like the one you proposed in the patch)
to make it more convenient to run optimization passes in a GUnit test. It
doesn't seem to me like an #ifdef based approach to embedding IR in C++
would result in particularly more manageable/formattable code than a raw string.
Perhaps the proposed improvements could be used to reduce/remove the cost of
adding new GUnit tests/the need to touch CMake files/etc. (though I'd worry
about the divergence of where optimization tests are written - as unit tests or
as lit/FileCheck tests - that doesn't mean experimentation isn't
worthwhile, but I think it'd require a pretty compelling justification to
propose a replacement to the FileCheck approach (& perhaps a timeline for an
experiment before either removing it, or deciding that it is the intended future
state) - and if it's not a replacement, then I think we'd need to
discuss what sort of situations this new thing is suitable and what FileCheck
should be used for going forward)
As mentioned in the Differential, generating the tests automatically
will lose information about what actually is intended to be tested,
making it harder to understand the test.
If the had a method to just update all tests (Polly as `ninja
polly-update-format` to automatically update `ninja
polly-check-format` failures), updating is easier, but because of the
lack of understanding in practice most changes will just be glossed
over.

We already have a split between unittests (eg.
llvm/Transforms/Scalar/LICMTests.cpp) and regression tests
(llvm/test/Transforms/LICM/*.ll). New tests in D82426 are located next
to the .ll files. The unittests could be moved there too.

Michael

David Blaikie via llvm-dev

2020-Jun-24 15:12 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

On Wed, Jun 24, 2020 at 8:04 AM Michael Kruse <llvmdev at meinersbur.de>
wrote:>
> Am Mi., 24. Juni 2020 um 00:37 Uhr schrieb David Blaikie <dblaikie at
gmail.com>:
> > I'm pretty change averse - and am in this case (but doesn't
mean other folks won't be in favor and doesn't mean it isn't worth
trying, etc - but if it were up to me, at the moment, I'd decline)
>
> That's understandable. New features also come with a cost that they
> need to recoup with their usefulness.
>
>
> > To take one point from the phab review (to try to keep the discussion
here rather than there - might be worth a note on the phab review to discourage
commenting there so the discussion doesn't get spread through different
threads):
>
> I added a remark to the Phabrictor thread.
> (I kept these arguments there to keep the RFC itself short)
>
>
> > > Because of all of the above, maintenance of all the regression
tests is a nightmare. I expect it to be a serious issue for introducing opaque
pointers. My prediction is that we will have a typed-pointer command line flag
to not have to require updating all the write-only regression tests.
> >
> > Actually I already did a lot of work with the initial patches years
ago for opaque pointers (that added explicit types to gep/store/etc) and used
(& provided in the commit messages) python scripts to migrate all the tests,
both the IR itself and the CHECK text. This is probably more readily automatable
than a more free-form C++ based checking proposed here.
> >
> > That said, it sounds like the proposal is a lot closer to the GUnit
tests, and if this testing strategy is valuable, it seems like it could be
mostly achieved by adding some API utilities (like the one you proposed in the
patch) to make it more convenient to run optimization passes in a GUnit test. It
doesn't seem to me like an #ifdef based approach to embedding IR in C++
would result in particularly more manageable/formattable code than a raw string.
Perhaps the proposed improvements could be used to reduce/remove the cost of
adding new GUnit tests/the need to touch CMake files/etc. (though I'd worry
about the divergence of where optimization tests are written - as unit tests or
as lit/FileCheck tests - that doesn't mean experimentation isn't
worthwhile, but I think it'd require a pretty compelling justification to
propose a replacement to the FileCheck approach (& perhaps a timeline for an
experiment before either removing it, or deciding that it is the intended future
state) - and if it's not a replacement, then I think we'd need to
discuss what sort of situations this new thing is suitable and what FileCheck
should be used for going forward)
>
> As mentioned in the Differential, generating the tests automatically
> will lose information about what actually is intended to be tested,
Agreed - and I didn't mean to suggest tests should be automatically
generated. I work pretty hard in code reviews to encourage tests to be
written as stable-y as possible so they are resilient to unrelated
changes. The python scripts I wrote to update tests didn't require
tests to be written in an automated fashion but were still able to
migrate the significant majority of hand-written FileCheck test cases.
> making it harder to understand the test.
> If the had a method to just update all tests (Polly as `ninja
> polly-update-format` to automatically update `ninja
> polly-check-format` failures), updating is easier, but because of the
> lack of understanding in practice most changes will just be glossed
> over.
>
> We already have a split between unittests (eg.
> llvm/Transforms/Scalar/LICMTests.cpp) and regression tests
> (llvm/test/Transforms/LICM/*.ll). New tests in D82426 are located next
> to the .ll files. The unittests could be moved there too.
Yep, if there's a nice way to change lit/cmake so that unit tests can
live alongside the FileCheck tests, I think that might be an
improvement. Though for now I'd only expect to find a unit test for
something when it's not reasonably FileCheck testable, in this case it
looks like the LICMTest.cpp is testing side effects on analyses when
running LICM, which makes some sense - analyses have no textual
representation, so can't be readily FileCheck tested, only their side
effects can.

- Dave

Michael Kruse via llvm-dev

2020-Jun-24 16:43 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

Am Mi., 24. Juni 2020 um 10:12 Uhr schrieb David Blaikie <dblaikie at
gmail.com>:> > As mentioned in the Differential, generating the tests automatically
> > will lose information about what actually is intended to be tested,
>
> Agreed - and I didn't mean to suggest tests should be automatically
> generated. I work pretty hard in code reviews to encourage tests to be
> written as stable-y as possible so they are resilient to unrelated
> changes. The python scripts I wrote to update tests didn't require
> tests to be written in an automated fashion but were still able to
> migrate the significant majority of hand-written FileCheck test cases.
My argument is that it is hard to impossible to really only test the
relevant bits using FileCheck. CHECK-DAG, named regexes etc are
mechanisms making this possible, but at the same time make the
verification more complicated.
I don't think it is feasible to write single-purpose update scripts
for most changes. Even if there is one, it is even less feasible to
ensure for all tests that they still test what was originally
intended, especially with CHECK-NOT.

I had to put a lot of effort into updating loop metadata tests.
Because metadata nodes have sequential numbers in order the IR emitter
decides to put them, it is tempting to use the form ![[NODENAME:.*]]
for each occurance so you can reorder the lines in the order they
occur, as indeed I found in many regression tests. Three problems with
this:
1. When the regex is specified, it will overwrite the content of
previous placeholders, i.e. if used everywhere, it is equivalent to
{{.*}}
2. Using a backtracking regex engine, there are inputs with
exponential time behaviour.having mistake
3. It will match more than needed, e.g. a list of nodes
{![[NODENAME:.*]]} will also match !{!0, !1} and FileCheck will
complain somewhere else that

!0 = {!0, !1}

does not match

![[NODENAME]] = {![[NODENAME]], ![[LOOP1:.*]]}

(if not the 'regex everywhere' mistake was made)

A "robust" test would use [[NODENAME:[0-9]+]] and CHECK-DAG, as some
tests do, but also making the CHECK lines even longer and more
cluttered. In contrast to instructions, not all metadata lines have
recognizable keywords that could indicate what it might intend to
match. CHECK-DAG will happily match with any metadata node that has
the same number of operands, deferring a mismatch report to some later
CHECK line, but due to the DAG part continuing to match with previous
lines.

Unfortunately, most tests we have do not even use placeholders for
metadata nodes, including those generated by update_test_checks.py. I
looked into improving the script in this regard, but its
implementation is function-centric, making it difficult to add
module-level placeholders.

As a result, I estimate that I had to invest about twice the time to
update/fix tests than writing the code changes themselves. Due to my
experience, I find updating FileCheck tests very frustrating.
I'd prefer not to test whether specific metadata nodes are present,
but whether the LLVM API to query them (such as `isAnnotatedParallel`)
returns the expected result.

Michael

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Jun 2020 - [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

Maybe Matching Threads