thr3ads.net - llvm dev - [llvm-dev] [RFC] Compiled regression tests. [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2020-Jul-01 16:36 UTC

[llvm-dev] [RFC] Compiled regression tests.

On 7/1/20 11:13 AM, Michael Kruse wrote:> Am Mi., 1. Juli 2020 um 09:33 Uhr schrieb Hal Finkel <hfinkel at anl.gov
> <mailto:hfinkel at anl.gov>>:
>
>     I definitely agree that we should not be trying to do this kind of
>     checking using textual metadata-node matching in FileCheck. The
>     alternative already available is to add an analysis pass with some
>     kind of verifier output. This output, not the raw metadata itself,
>     can be checked by FileCheck. We also need to check the
>     verification code, but at least that's something we can keep just
>     in one place. For parallel annotations, we already have such a
>     thing (we can run opt -loops -analyze; e.g., in
>     test/Analysis/LoopInfo/annotated-parallel-complex.ll). We also do
>     this kind of thing for the cost model (by running with -cost-model
>     -analyze). To what extent would making more-extensive use of this
>     technique address the use cases you're trying to address?
>
> The CHECK lines in annotated-parallel-complex.ll are:
>
> ; CHECK: Parallel Loop at depth 1
> ; CHECK-NOT: Parallel
> ; CHECK:     Loop at depth 2
> ; CHECK:         Parallel Loop
> ; CHECK:             Parallel Loop
>
> When adding this test, I had to change LoopInfo to emit the
"Parallel"
> in front of "Loop". For readability, I would have preferred the 
> parallel info as a "tag", such as `Loop (parallel) as depth 1`,
but
> this would break other tests that check "Loop at depth 1". Later
I
> noticed that there are regression tests that check "LoopFullUnrollPass
> on Loop at depth 3 containing: %l0.0.0<header>", but it seems I
got
> lucky in that none of these loops have parallel annotations.
>
> "CHECK-NOT" is inherently fragile. It is too easy to make a
change in
> LLVM that changes the text output and oversee that this test does not 
> check what it was supposed to test. For a FileCheck-friendlier output, 
> it could emit "NonParallel" and match this. However, this
clutters the
> output for humans, will actually break the "LoopFullUnrollPass on Loop
> at depth 3 ..." and "CHECK: Parallel" matches this as well
since
> FileCheck ignores word boundaries.
>
> The CHECK lines test more than necessary. The first and third CHECK 
> lines also check the "at depth" to make it match the correct loop
(and
> not, e.g. match the next inner loop), although we are not interested 
> in the loop depths themselves. Ironically, is the reason why cannot be 
> tags between "Loop" and "at depth"

We can have different printing modes. There can be a more-human-friendly 
mode and a more-FileCheck-friendly mode. Or modes customized for 
different kinds of tests. I agree, however, that this does not solve the 
fragility problems with CHECK-NOT.

>
> Not all of the loop metadata have passes that print them. For 
> instance, there are loop properties such as llvm.loop.isvectorized. 
> Reading those is usually done using utility functions such as 
> getBooleanLoopAttribute(L, "llvm.loop.isvectorized"). A solution
using
> FileCheck would be to add another pass that prints loop metadata. That 
> pass would only be used for testing, making the release LLVM binaries 
> larger than necessary and still have the other problems.
>
> Processing the IR through a tool can make the output more 
> FileCheck-friendly, but it doesn't make its problems disappear. IMHO 
> it adds to the maintenance burden since it adds more textual interfaces.

That's the interesting question... it does add to the maintenance 
burden. However, having textual outputs are also quite convenient when 
debugging things (because I can change the input and see the output 
quickly, without needing to create and compile another program). 
Obviously, at some point, this becomes ridiculous. How much is too much? 
I don't know. But it's also not clear to me that we're at that point
yet. We could add more textual analysis outputs and still have that be a 
net benefit in many places.

In cases where the requisite output would just be too specific, we do 
have unit tests. Should we just add more? Maybe we're too happy to add 
lit tests instead of unit tests for some of these cases.

>
> Michael
>
>
>
>
>
>
>-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200701/ef4ddf56/attachment.html>

Michael Kruse via llvm-dev

2020-Jul-01 19:06 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

Am Mi., 1. Juli 2020 um 11:37 Uhr schrieb Hal Finkel <hfinkel at
anl.gov>:> We can have different printing modes. There can be a more-human-friendly
mode and a more-FileCheck-friendly mode. Or modes customized for different kinds
of tests. I agree, however, that this does not solve the fragility problems with
CHECK-NOT.
This would be similar to git's porcelain and plumbing modes. However,
even with git which had this from the beginning, often scripts use
porcelain output.

Another example are commands such as that change output on whether
stdout is a terminal or a pipe. However, I find such distinction
between modes more confusing than helpful.

> That's the interesting question... it does add to the maintenance
burden. However, having textual outputs are also quite convenient when debugging
things (because I can change the input and see the output quickly, without
needing to create and compile another program). Obviously, at some point, this
becomes ridiculous. How much is too much? I don't know. But it's also
not clear to me that we're at that point yet. We could add more textual
analysis outputs and still have that be a net benefit in many places.
>
> In cases where the requisite output would just be too specific, we do have
unit tests. Should we just add more? Maybe we're too happy to add lit tests
instead of unit tests for some of these cases.
The RFC is not about replacing all uses of FileCheck, there are
certainly cases where it is straightforward, simple and robust, but
for some things it would be nice to have another tool in the toolbox.
The more workarounds, FileCheck features, test generators, etc are
needed to author appropriate tests, the more I get the impression
FileCheck is the wrong tool.

As an example, take `clang -verify` tests. It is certainly possible to
check diagnostic output using FileCheck, so why does clang have a
-verify option?


Michael

Hal Finkel via llvm-dev

2020-Jul-01 19:35 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

On 7/1/20 2:06 PM, Michael Kruse wrote:> Am Mi., 1. Juli 2020 um 11:37 Uhr schrieb Hal Finkel <hfinkel at
anl.gov>:
>> We can have different printing modes. There can be a
more-human-friendly mode and a more-FileCheck-friendly mode. Or modes customized
for different kinds of tests. I agree, however, that this does not solve the
fragility problems with CHECK-NOT.
> This would be similar to git's porcelain and plumbing modes. However,
> even with git which had this from the beginning, often scripts use
> porcelain output.

Fair point. In this case, however, we control all of the relevant 
scripts. We can have a policy about which modes can be used in 
regression tests and which are designed only for human consumption. We 
can mark them as such.

>
> Another example are commands such as that change output on whether
> stdout is a terminal or a pipe. However, I find such distinction
> between modes more confusing than helpful.

I find that annoying :-)

>
>
>> That's the interesting question... it does add to the maintenance
burden. However, having textual outputs are also quite convenient when debugging
things (because I can change the input and see the output quickly, without
needing to create and compile another program). Obviously, at some point, this
becomes ridiculous. How much is too much? I don't know. But it's also
not clear to me that we're at that point yet. We could add more textual
analysis outputs and still have that be a net benefit in many places.
>>
>> In cases where the requisite output would just be too specific, we do
have unit tests. Should we just add more? Maybe we're too happy to add lit
tests instead of unit tests for some of these cases.
> The RFC is not about replacing all uses of FileCheck, there are
> certainly cases where it is straightforward, simple and robust, but
> for some things it would be nice to have another tool in the toolbox.
> The more workarounds, FileCheck features, test generators, etc are
> needed to author appropriate tests, the more I get the impression
> FileCheck is the wrong tool.

FileCheck on raw IR is not the right approach in many cases. I think we 
all agree about that. The question is: If our go-to solution in such 
cases is to introduce an analysis pass with a textual output that 
FileCheck can process, is that bad? Are there cases where such a pass 
would not be applicable to multiple tests? Are there cases where such a 
pass would not be useful for humans during development? Cases don't 
particularly come to mind where this would be true, but I'm definitely 
interested in what everyone else thinks.

When I teach my compilers class, I tell my students to liberally add the 
ability to serialize to interpretable text all of their internal data 
structures. It will seem like extra work at first, but when they're 
trying to debug things later, it will be really helpful. I think this is 
a key lesson that I, at least, have learned from LLVM. It makes us all 
more productive in the end (in part because we often spend much more 
time debugging our code than writing it in the first place). Firing up 
an actual debugger is slow and (despite our best efforts) fragile, 
changing a textual input and running it through something that produces 
textual output is fast.

>
> As an example, take `clang -verify` tests. It is certainly possible to
> check diagnostic output using FileCheck, so why does clang have a
> -verify option?

AFAIK, to make it easy to write tests to verify that particular message 
appear with specific line numbers relative to the code in the tests 
without hard-coding the particular line numbers in the test (including 
with offsets, etc.). FileCheck probably could have been enhanced to do 
this directly (especially with all of its recent enhancements).

  -Hal

>
>
> Michael
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

llvm dev - Jul 2020 - [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.