thr3ads.net - llvm dev - [llvm-dev] [RFC] Compiled regression tests. [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2020-Jun-24 16:43 UTC

[llvm-dev] [RFC] Compiled regression tests.

Am Mi., 24. Juni 2020 um 10:12 Uhr schrieb David Blaikie <dblaikie at
gmail.com>:> > As mentioned in the Differential, generating the tests automatically
> > will lose information about what actually is intended to be tested,
>
> Agreed - and I didn't mean to suggest tests should be automatically
> generated. I work pretty hard in code reviews to encourage tests to be
> written as stable-y as possible so they are resilient to unrelated
> changes. The python scripts I wrote to update tests didn't require
> tests to be written in an automated fashion but were still able to
> migrate the significant majority of hand-written FileCheck test cases.
My argument is that it is hard to impossible to really only test the
relevant bits using FileCheck. CHECK-DAG, named regexes etc are
mechanisms making this possible, but at the same time make the
verification more complicated.
I don't think it is feasible to write single-purpose update scripts
for most changes. Even if there is one, it is even less feasible to
ensure for all tests that they still test what was originally
intended, especially with CHECK-NOT.

I had to put a lot of effort into updating loop metadata tests.
Because metadata nodes have sequential numbers in order the IR emitter
decides to put them, it is tempting to use the form ![[NODENAME:.*]]
for each occurance so you can reorder the lines in the order they
occur, as indeed I found in many regression tests. Three problems with
this:
1. When the regex is specified, it will overwrite the content of
previous placeholders, i.e. if used everywhere, it is equivalent to
{{.*}}
2. Using a backtracking regex engine, there are inputs with
exponential time behaviour.having mistake
3. It will match more than needed, e.g. a list of nodes
{![[NODENAME:.*]]} will also match !{!0, !1} and FileCheck will
complain somewhere else that

!0 = {!0, !1}

does not match

![[NODENAME]] = {![[NODENAME]], ![[LOOP1:.*]]}

(if not the 'regex everywhere' mistake was made)

A "robust" test would use [[NODENAME:[0-9]+]] and CHECK-DAG, as some
tests do, but also making the CHECK lines even longer and more
cluttered. In contrast to instructions, not all metadata lines have
recognizable keywords that could indicate what it might intend to
match. CHECK-DAG will happily match with any metadata node that has
the same number of operands, deferring a mismatch report to some later
CHECK line, but due to the DAG part continuing to match with previous
lines.

Unfortunately, most tests we have do not even use placeholders for
metadata nodes, including those generated by update_test_checks.py. I
looked into improving the script in this regard, but its
implementation is function-centric, making it difficult to add
module-level placeholders.

As a result, I estimate that I had to invest about twice the time to
update/fix tests than writing the code changes themselves. Due to my
experience, I find updating FileCheck tests very frustrating.
I'd prefer not to test whether specific metadata nodes are present,
but whether the LLVM API to query them (such as `isAnnotatedParallel`)
returns the expected result.

Michael

Michael Kruse via llvm-dev

2020-Jul-01 05:40 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

To illustrate some difficulties with FileCheck, lets make a non-semantic
change in LLVM:

    --- a/llvm/lib/Analysis/VectorUtils.cpp
    +++ b/llvm/lib/Analysis/VectorUtils.cpp
    @@ -642,8 +642,8 @@ MDNode *llvm::uniteAccessGroups(MDNode *AccGroups1,
MDNode *AccGroups2) {
         return AccGroups1;

       SmallSetVector<Metadata *, 4> Union;
    -  addToAccessGroupList(Union, AccGroups1);
       addToAccessGroupList(Union, AccGroups2);
    +  addToAccessGroupList(Union, AccGroups1);

       if (Union.size() == 0)
         return nullptr;

This changes the order of access groups when merging them.
Same background: Access groups are used to mark that a side-effect (e.g.
memory access) does not limit the parallelization of a loop, added by e.g.
#pragma clang loop vectorize(assume_safety). Therefore a loop can be
assumed to be parallelizable without further dependency analysis if all its
side-effect instructions are marked this way (and has no loop-carried
scalar dependencies).
An access group represents the instructions marked for a specific loop. A
single instruction can be parallel with regrads to multiple loop, i.e.
belong to multiple access groups. The order of the access groups is
insignificant, i.e. whether either `AccGroups1` or `AccGroups2` comes
first, the instruction is marked parallel for both loops.

Even the change should have no effect on correctness, `ninja check-llvm`
fails:

    test/Transforms/Inline/parallel-loop-md-merge.ll

     string not found in input
    ; CHECK: ![[ACCESSES_INNER]] = !{!"llvm.loop.parallel_accesses",
![[ACCESS_GROUP_INNER]]}
             ^
    <stdin>:45:1: note: scanning from here
    !7 = !{!"llvm.loop.parallel_accesses", !5}
    ^
    <stdin>:45:1: note: with "ACCESSES_INNER" equal to
"7"
    !7 = !{!"llvm.loop.parallel_accesses", !5}
    ^
    <stdin>:45:1: note: with "ACCESS_GROUP_INNER" equal to
"4"
    !7 = !{!"llvm.loop.parallel_accesses", !5}
    ^

The line FileCheck points to has nothing to do with the changed order of
access groups.
(what's the point of annotating the `!7 = ...` line repeatedly? Why not
pointing to the lines where ACCESSES_INNER/ACCESS_GROUP_INNER have their
values from?)

The CHECK lines, annotated with their line numbers from
parallel-loop-md-merge.ll, are:

    68 ; CHECK: store double 4.200000e+01, {{.*}} !llvm.access.group
![[ACCESS_GROUP_LIST_3:[0-9]+]]
    69 ; CHECK: br label %for.cond.i, !llvm.loop ![[LOOP_INNER:[0-9]+]]
    70 ; CHECK: br label %for.cond, !llvm.loop ![[LOOP_OUTER:[0-9]+]]
    71
    72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] !{![[ACCESS_GROUP_INNER:[0-9]+]],
![[ACCESS_GROUP_OUTER:[0-9]+]]}
    73 ; CHECK: ![[ACCESS_GROUP_INNER]] = distinct !{}
    74 ; CHECK: ![[ACCESS_GROUP_OUTER]] = distinct !{}
    75 ; CHECK: ![[LOOP_INNER]] = distinct !{![[LOOP_INNER]],
![[ACCESSES_INNER:[0-9]+]]}
    76 ; CHECK: ![[ACCESSES_INNER]] =
!{!"llvm.loop.parallel_accesses",
![[ACCESS_GROUP_INNER]]}
    77 ; CHECK: ![[LOOP_OUTER]] = distinct !{![[LOOP_OUTER]],
![[ACCESSES_OUTER:[0-9]+]]}
    78 ; CHECK: ![[ACCESSES_OUTER]] =
!{!"llvm.loop.parallel_accesses",
![[ACCESS_GROUP_OUTER]]}

And the output of `FileCheck --dump-input -v` is:

         38: !0 = !{!1}
         39: !1 = distinct !{!1, !2, !"callee: %A"}
         40: !2 = distinct !{!2, !"callee"}
         41: !3 = !{!4, !5}
check:72     ^~~~~~~~~~~~~~
         42: !4 = distinct !{}
check:73     ^~~~~~~~~~~~~~~~~
         43: !5 = distinct !{}
check:74     ^~~~~~~~~~~~~~~~~
         44: !6 = distinct !{!6, !7}
check:75     ^~~~~~~~~~~~~~~~~~~~~~~
         45: !7 = !{!"llvm.loop.parallel_accesses", !5}
check:76     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match
found
         46: !8 = distinct !{!8, !9}
check:76     ~~~~~~~~~~~~~~~~~~~~~~~
         47: !9 = !{!"llvm.loop.parallel_accesses", !4}
check:76     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To fix this regression test, one needs to solve the following questions:

   -  Is `!0 = !{!1}` supposed to match with anything?
   -  Which of the CHECK lines are the ones that are important for the
   regression tests?
   -  The `![[SOMETHING]] = distinct !{}` line appears twice. Which one is
   ACCESS_GROUP_INNER/ACCESS_GROUP_OUTER?
   -  There are two lines with `llvm.loop.parallel_accesses` as well. The
   author of this regression test was kind enough to give the placeholders
   descriptive names, but do they match with what FileCheck matched?
   -  LOOP_INNER and LOOP_OUTER are only distinguished by position and
   `.i`. Why does the inner loop come first?
   -  Before the patch that modifies the order, the output of
   --dump-input=always was:

     38: !0 = !{!1}
     39: !1 = distinct !{!1, !2, !"callee: %A"}
     40: !2 = distinct !{!2, !"callee"}
     41: !3 = !{!4, !5}
     42: !4 = distinct !{}
     43: !5 = distinct !{}
     44: !6 = distinct !{!6, !7}
     45: !7 = !{!"llvm.loop.parallel_accesses", !4}
     46: !8 = distinct !{!8, !9}
     47: !9 = !{!"llvm.loop.parallel_accesses", !5}

This seems to suggest that our change caused !4 and !5 in lines 45 and 47
to swap. This is not what the patch did, which changes to order of two
MDNode arguments. The only MDNode that has two other MDNodes as operands is
line (line !6 and !8 are representing loops as they reference themselves as
first arguments; noalias scopes do this as well, but there is no noalias
metadata in this example).

An hasty fix is to change CHECK lines 76 and 77 to

    76 ; CHECK: ![[ACCESSES_INNER]] =
!{!"llvm.loop.parallel_accesses",
![[ACCESS_GROUP_OUTER]]}
    78 ; CHECK: ![[ACCESSES_OUTER]] =
!{!"llvm.loop.parallel_accesses",
![[ACCESS_GROUP_INNER]]}

However, ACCESS_GROUP_OUTER belongs to ACCESSES_INNER and
ACCESS_GROUP_INNER to ACCESSES_OUTER? This cannot be true.


The reason is that MDNodes are emitted in topological order of a
depth-first search. The patch changed the order of ACCESS_GROUP_INNER and
ACCESS_GROUP_OUTER in line 41 but because in the mitted IR file, the first
argument of !3 is emitted before the second argument, ACCESS_GROUP_OUTER is
on line 42 and ACCESS_GROUP_INNER on line 43 instead.

Even though this regression test uses proper regex-ing of metadata node
numbers, it is still fragile as it fails on irrelevant changes. Once it
fails, it requires time to fix properly because it is non-obvious which
lines are intended to match which output lines. The hasty fix would have
mixed up ACCESS_GROUP_OUTER/ACCESS_GROUP_INNER and I wouldn't expect
someone who is working on the inliner to take the time to understand all
the loop metadata. Set-semantics of metadata occurs often, such as noalias
metadata, loop properties, tbaa metadata, etc.

The difficulty gets amplified when there are multiple functions in the
regression tests; metadata of all functions all appended to the end and
MDNodes with same content uniqued, making it impossible to associate
particular MDNodes with specific functions.


Ideally the regression test would be robust and understandable, achievable
with two asserts in a unittest:

    Loop &OuterLoop = *LI->begin();
    ASSERT_TRUE(OuterLoop.isAnnotatedParallel());
    Loop &InnerLoop = *OuterLoop.begin();
    ASSERT_TRUE(InnerLoop.isAnnotatedParallel());
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200701/0aae6e4f/attachment.html>

Hal Finkel via llvm-dev

2020-Jul-01 14:33 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

On 7/1/20 12:40 AM, Michael Kruse via llvm-dev wrote:> To illustrate some difficulties with FileCheck, lets make a 
> non-semantic change in LLVM:
>
>     --- a/llvm/lib/Analysis/VectorUtils.cpp
>     +++ b/llvm/lib/Analysis/VectorUtils.cpp
>     @@ -642,8 +642,8 @@ MDNode *llvm::uniteAccessGroups(MDNode 
> *AccGroups1, MDNode *AccGroups2) {
>          return AccGroups1;
>
>        SmallSetVector<Metadata *, 4> Union;
>     -  addToAccessGroupList(Union, AccGroups1);
>        addToAccessGroupList(Union, AccGroups2);
>     +  addToAccessGroupList(Union, AccGroups1);
>
>        if (Union.size() == 0)
>          return nullptr;
>
> This changes the order of access groups when merging them.
> Same background: Access groups are used to mark that a side-effect 
> (e.g. memory access) does not limit the parallelization of a loop, 
> added by e.g. #pragma clang loop vectorize(assume_safety). Therefore a 
> loop can be assumed to be parallelizable without further dependency 
> analysis if all its side-effect instructions are marked this way (and 
> has no loop-carried scalar dependencies).
> An access group represents the instructions marked for a specific 
> loop. A single instruction can be parallel with regrads to multiple 
> loop, i.e. belong to multiple access groups. The order of the access 
> groups is insignificant, i.e. whether either `AccGroups1` or 
> `AccGroups2` comes first, the instruction is marked parallel for both 
> loops.
>
> Even the change should have no effect on correctness, `ninja 
> check-llvm` fails:
>
>     test/Transforms/Inline/parallel-loop-md-merge.ll
>
>      string not found in input
>     ; CHECK: ![[ACCESSES_INNER]] =
!{!"llvm.loop.parallel_accesses",
> ![[ACCESS_GROUP_INNER]]}
>              ^
>     <stdin>:45:1: note: scanning from here
>     !7 = !{!"llvm.loop.parallel_accesses", !5}
>     ^
>     <stdin>:45:1: note: with "ACCESSES_INNER" equal to
"7"
>     !7 = !{!"llvm.loop.parallel_accesses", !5}
>     ^
>     <stdin>:45:1: note: with "ACCESS_GROUP_INNER" equal to
"4"
>     !7 = !{!"llvm.loop.parallel_accesses", !5}
>     ^
>
> The line FileCheck points to has nothing to do with the changed order 
> of access groups.
> (what's the point of annotating the `!7 = ...` line repeatedly? Why 
> not pointing to the lines where ACCESSES_INNER/ACCESS_GROUP_INNER have 
> their values from?)
>
> The CHECK lines, annotated with their line numbers from 
> parallel-loop-md-merge.ll, are:
>
>     68 ; CHECK: store double 4.200000e+01, {{.*}} !llvm.access.group 
> ![[ACCESS_GROUP_LIST_3:[0-9]+]]
>     69 ; CHECK: br label %for.cond.i, !llvm.loop ![[LOOP_INNER:[0-9]+]]
>     70 ; CHECK: br label %for.cond, !llvm.loop ![[LOOP_OUTER:[0-9]+]]
>     71
>     72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] = 
> !{![[ACCESS_GROUP_INNER:[0-9]+]], ![[ACCESS_GROUP_OUTER:[0-9]+]]}
>     73 ; CHECK: ![[ACCESS_GROUP_INNER]] = distinct !{}
>     74 ; CHECK: ![[ACCESS_GROUP_OUTER]] = distinct !{}
>     75 ; CHECK: ![[LOOP_INNER]] = distinct !{![[LOOP_INNER]], 
> ![[ACCESSES_INNER:[0-9]+]]}
>     76 ; CHECK: ![[ACCESSES_INNER]] = 
> !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_INNER]]}
>     77 ; CHECK: ![[LOOP_OUTER]] = distinct !{![[LOOP_OUTER]], 
> ![[ACCESSES_OUTER:[0-9]+]]}
>     78 ; CHECK: ![[ACCESSES_OUTER]] = 
> !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_OUTER]]}
>
> And the output of `FileCheck --dump-input -v` is:
>
>          38: !0 = !{!1}
>          39: !1 = distinct !{!1, !2, !"callee: %A"}
>          40: !2 = distinct !{!2, !"callee"}
>          41: !3 = !{!4, !5}
> check:72     ^~~~~~~~~~~~~~
>          42: !4 = distinct !{}
> check:73     ^~~~~~~~~~~~~~~~~
>          43: !5 = distinct !{}
> check:74     ^~~~~~~~~~~~~~~~~
>          44: !6 = distinct !{!6, !7}
> check:75     ^~~~~~~~~~~~~~~~~~~~~~~
>          45: !7 = !{!"llvm.loop.parallel_accesses", !5}
> check:76     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no 
> match found
>          46: !8 = distinct !{!8, !9}
> check:76     ~~~~~~~~~~~~~~~~~~~~~~~
>          47: !9 = !{!"llvm.loop.parallel_accesses", !4}
> check:76     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> To fix this regression test, one needs to solve the following questions:
>
>   *  Is `!0 = !{!1}` supposed to match with anything?
>   *  Which of the CHECK lines are the ones that are important for the
>     regression tests?
>   *  The `![[SOMETHING]] = distinct !{}` line appears twice. Which one
>     is ACCESS_GROUP_INNER/ACCESS_GROUP_OUTER?
>   *  There are two lines with `llvm.loop.parallel_accesses` as well.
>     The author of this regression test was kind enough to give the
>     placeholders descriptive names, but do they match with what
>     FileCheck matched?
>   *  LOOP_INNER and LOOP_OUTER are only distinguished by position and
>     `.i`. Why does the inner loop come first?
>   *  Before the patch that modifies the order, the output of
>     --dump-input=always was:
>
>      38: !0 = !{!1}
>      39: !1 = distinct !{!1, !2, !"callee: %A"}
>      40: !2 = distinct !{!2, !"callee"}
>      41: !3 = !{!4, !5}
>      42: !4 = distinct !{}
>      43: !5 = distinct !{}
>      44: !6 = distinct !{!6, !7}
>      45: !7 = !{!"llvm.loop.parallel_accesses", !4}
>      46: !8 = distinct !{!8, !9}
>      47: !9 = !{!"llvm.loop.parallel_accesses", !5}
>
> This seems to suggest that our change caused !4 and !5 in lines 45 and 
> 47 to swap. This is not what the patch did, which changes to order of 
> two MDNode arguments. The only MDNode that has two other MDNodes as 
> operands is line (line !6 and !8 are representing loops as they 
> reference themselves as first arguments; noalias scopes do this as 
> well, but there is no noalias metadata in this example).
>
> An hasty fix is to change CHECK lines 76 and 77 to
>
>     76 ; CHECK: ![[ACCESSES_INNER]] = 
> !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_OUTER]]}
>     78 ; CHECK: ![[ACCESSES_OUTER]] = 
> !{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_INNER]]}
>
> However, ACCESS_GROUP_OUTER belongs to ACCESSES_INNER and 
> ACCESS_GROUP_INNER to ACCESSES_OUTER? This cannot be true.
>
>
> The reason is that MDNodes are emitted in topological order of a 
> depth-first search. The patch changed the order of ACCESS_GROUP_INNER 
> and ACCESS_GROUP_OUTER in line 41 but because in the mitted IR file, 
> the first argument of !3 is emitted before the second argument, 
> ACCESS_GROUP_OUTER is on line 42 and ACCESS_GROUP_INNER on line 43 
> instead.
>
> Even though this regression test uses proper regex-ing of metadata 
> node numbers, it is still fragile as it fails on irrelevant changes. 
> Once it fails, it requires time to fix properly because it is 
> non-obvious which lines are intended to match which output lines. The 
> hasty fix would have mixed up ACCESS_GROUP_OUTER/ACCESS_GROUP_INNER 
> and I wouldn't expect someone who is working on the inliner to take 
> the time to understand all the loop metadata. Set-semantics of 
> metadata occurs often, such as noalias metadata, loop properties, tbaa 
> metadata, etc.
>
> The difficulty gets amplified when there are multiple functions in the 
> regression tests; metadata of all functions all appended to the end 
> and MDNodes with same content uniqued, making it impossible to 
> associate particular MDNodes with specific functions.
>
>
> Ideally the regression test would be robust and understandable, 
> achievable with two asserts in a unittest:
>
>     Loop &OuterLoop = *LI->begin();
>     ASSERT_TRUE(OuterLoop.isAnnotatedParallel());
>     Loop &InnerLoop = *OuterLoop.begin();
>     ASSERT_TRUE(InnerLoop.isAnnotatedParallel());

I definitely agree that we should not be trying to do this kind of 
checking using textual metadata-node matching in FileCheck. The 
alternative already available is to add an analysis pass with some kind 
of verifier output. This output, not the raw metadata itself, can be 
checked by FileCheck. We also need to check the verification code, but 
at least that's something we can keep just in one place. For parallel 
annotations, we already have such a thing (we can run opt -loops 
-analyze; e.g., in 
test/Analysis/LoopInfo/annotated-parallel-complex.ll). We also do this 
kind of thing for the cost model (by running with -cost-model -analyze). 
To what extent would making more-extensive use of this technique address 
the use cases you're trying to address?

Maybe we should, also, just make more-extensive use of unit tests?

Thanks again,

Hal


>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200701/09eb3530/attachment.html>

Robinson, Paul via llvm-dev

2020-Jul-01 15:18 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

The test as written is fragile because it requires a certain ordering.  If the
output order is not important, use CHECK-DAG rather than CHECK.  This would be a
failure to understand the testing tool.

My experience, over a 40-year career, is that good software developers are
generally not very good test-writers.  These are different skills and good
testing is frequently not taught.  It’s easy to write fragile tests; you make
your patch, you see what comes out, and you write the test to expect exactly
that output, using the minimum possible features of the testing tool.  This is
poor technique all around.  We even have scripts that automatically generate
such tests, used primarily in codegen tests.  I devoutly hope that the people
who produce those tests responsibly eyeball all those cases.

The proposal appears to be to migrate output-based tests (using
ever-more-complicated FileCheck features) to executable tests, which makes it
more like the software development people are used to instead of test-writing. 
But I don’t see that necessarily solving the problem; seems like it would be
just as easy to write a program that doesn’t do the right thing as to write a
FileCheck test that doesn’t do the right thing.

Hal’s suggestion is more to the point:  If the output we’re generating is not
appropriate to the kinds of tests we want to perform, it can be worthwhile to
generate different kinds of output.  MIR is a case in point; for a long time it
was hard to introspect into the interval between IR and final machine code, but
now it’s a lot easier.
--paulr

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Michael
Kruse via llvm-dev
Sent: Wednesday, July 1, 2020 1:40 AM
To: Michael Kruse <llvmdev at meinersbur.de>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] Compiled regression tests.

To illustrate some difficulties with FileCheck, lets make a non-semantic change
in LLVM:

    --- a/llvm/lib/Analysis/VectorUtils.cpp
    +++ b/llvm/lib/Analysis/VectorUtils.cpp
    @@ -642,8 +642,8 @@ MDNode *llvm::uniteAccessGroups(MDNode *AccGroups1,
MDNode *AccGroups2) {
         return AccGroups1;

       SmallSetVector<Metadata *, 4> Union;
    -  addToAccessGroupList(Union, AccGroups1);
       addToAccessGroupList(Union, AccGroups2);
    +  addToAccessGroupList(Union, AccGroups1);

       if (Union.size() == 0)
         return nullptr;

This changes the order of access groups when merging them.
Same background: Access groups are used to mark that a side-effect (e.g. memory
access) does not limit the parallelization of a loop, added by e.g. #pragma
clang loop vectorize(assume_safety). Therefore a loop can be assumed to be
parallelizable without further dependency analysis if all its side-effect
instructions are marked this way (and has no loop-carried scalar dependencies).
An access group represents the instructions marked for a specific loop. A single
instruction can be parallel with regrads to multiple loop, i.e. belong to
multiple access groups. The order of the access groups is insignificant, i.e.
whether either `AccGroups1` or `AccGroups2` comes first, the instruction is
marked parallel for both loops.

Even the change should have no effect on correctness, `ninja check-llvm` fails:

    test/Transforms/Inline/parallel-loop-md-merge.ll

     string not found in input
    ; CHECK: ![[ACCESSES_INNER]] = !{!"llvm.loop.parallel_accesses",
![[ACCESS_GROUP_INNER]]}
             ^
    <stdin>:45:1: note: scanning from here
    !7 = !{!"llvm.loop.parallel_accesses", !5}
    ^
    <stdin>:45:1: note: with "ACCESSES_INNER" equal to
"7"
    !7 = !{!"llvm.loop.parallel_accesses", !5}
    ^
    <stdin>:45:1: note: with "ACCESS_GROUP_INNER" equal to
"4"
    !7 = !{!"llvm.loop.parallel_accesses", !5}
    ^

The line FileCheck points to has nothing to do with the changed order of access
groups.
(what's the point of annotating the `!7 = ...` line repeatedly? Why not
pointing to the lines where ACCESSES_INNER/ACCESS_GROUP_INNER have their values
from?)

The CHECK lines, annotated with their line numbers from
parallel-loop-md-merge.ll, are:

    68 ; CHECK: store double 4.200000e+01, {{.*}} !llvm.access.group
![[ACCESS_GROUP_LIST_3:[0-9]+]]
    69 ; CHECK: br label %for.cond.i, !llvm.loop ![[LOOP_INNER:[0-9]+]]
    70 ; CHECK: br label %for.cond, !llvm.loop ![[LOOP_OUTER:[0-9]+]]
    71
    72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] = !{![[ACCESS_GROUP_INNER:[0-9]+]],
![[ACCESS_GROUP_OUTER:[0-9]+]]}
    73 ; CHECK: ![[ACCESS_GROUP_INNER]] = distinct !{}
    74 ; CHECK: ![[ACCESS_GROUP_OUTER]] = distinct !{}
    75 ; CHECK: ![[LOOP_INNER]] = distinct !{![[LOOP_INNER]],
![[ACCESSES_INNER:[0-9]+]]}
    76 ; CHECK: ![[ACCESSES_INNER]] =
!{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_INNER]]}
    77 ; CHECK: ![[LOOP_OUTER]] = distinct !{![[LOOP_OUTER]],
![[ACCESSES_OUTER:[0-9]+]]}
    78 ; CHECK: ![[ACCESSES_OUTER]] =
!{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_OUTER]]}

And the output of `FileCheck --dump-input -v` is:

         38: !0 = !{!1}
         39: !1 = distinct !{!1, !2, !"callee: %A"}
         40: !2 = distinct !{!2, !"callee"}
         41: !3 = !{!4, !5}
check:72     ^~~~~~~~~~~~~~
         42: !4 = distinct !{}
check:73     ^~~~~~~~~~~~~~~~~
         43: !5 = distinct !{}
check:74     ^~~~~~~~~~~~~~~~~
         44: !6 = distinct !{!6, !7}
check:75     ^~~~~~~~~~~~~~~~~~~~~~~
         45: !7 = !{!"llvm.loop.parallel_accesses", !5}
check:76     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
         46: !8 = distinct !{!8, !9}
check:76     ~~~~~~~~~~~~~~~~~~~~~~~
         47: !9 = !{!"llvm.loop.parallel_accesses", !4}
check:76     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To fix this regression test, one needs to solve the following questions:

  *    Is `!0 = !{!1}` supposed to match with anything?
  *    Which of the CHECK lines are the ones that are important for the
regression tests?
  *    The `![[SOMETHING]] = distinct !{}` line appears twice. Which one is
ACCESS_GROUP_INNER/ACCESS_GROUP_OUTER?
  *    There are two lines with `llvm.loop.parallel_accesses` as well. The
author of this regression test was kind enough to give the placeholders
descriptive names, but do they match with what FileCheck matched?
  *    LOOP_INNER and LOOP_OUTER are only distinguished by position and `.i`.
Why does the inner loop come first?
  *    Before the patch that modifies the order, the output of
--dump-input=always was:
     38: !0 = !{!1}
     39: !1 = distinct !{!1, !2, !"callee: %A"}
     40: !2 = distinct !{!2, !"callee"}
     41: !3 = !{!4, !5}
     42: !4 = distinct !{}
     43: !5 = distinct !{}
     44: !6 = distinct !{!6, !7}
     45: !7 = !{!"llvm.loop.parallel_accesses", !4}
     46: !8 = distinct !{!8, !9}
     47: !9 = !{!"llvm.loop.parallel_accesses", !5}

This seems to suggest that our change caused !4 and !5 in lines 45 and 47 to
swap. This is not what the patch did, which changes to order of two MDNode
arguments. The only MDNode that has two other MDNodes as operands is line (line
!6 and !8 are representing loops as they reference themselves as first
arguments; noalias scopes do this as well, but there is no noalias metadata in
this example).

An hasty fix is to change CHECK lines 76 and 77 to

    76 ; CHECK: ![[ACCESSES_INNER]] =
!{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_OUTER]]}
    78 ; CHECK: ![[ACCESSES_OUTER]] =
!{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_INNER]]}

However, ACCESS_GROUP_OUTER belongs to ACCESSES_INNER and ACCESS_GROUP_INNER to
ACCESSES_OUTER? This cannot be true.


The reason is that MDNodes are emitted in topological order of a depth-first
search. The patch changed the order of ACCESS_GROUP_INNER and ACCESS_GROUP_OUTER
in line 41 but because in the mitted IR file, the first argument of !3 is
emitted before the second argument, ACCESS_GROUP_OUTER is on line 42 and
ACCESS_GROUP_INNER on line 43 instead.

Even though this regression test uses proper regex-ing of metadata node numbers,
it is still fragile as it fails on irrelevant changes. Once it fails, it
requires time to fix properly because it is non-obvious which lines are intended
to match which output lines. The hasty fix would have mixed up
ACCESS_GROUP_OUTER/ACCESS_GROUP_INNER and I wouldn't expect someone who is
working on the inliner to take the time to understand all the loop metadata.
Set-semantics of metadata occurs often, such as noalias metadata, loop
properties, tbaa metadata, etc.

The difficulty gets amplified when there are multiple functions in the
regression tests; metadata of all functions all appended to the end and MDNodes
with same content uniqued, making it impossible to associate particular MDNodes
with specific functions.


Ideally the regression test would be robust and understandable, achievable with
two asserts in a unittest:

    Loop &OuterLoop = *LI->begin();
    ASSERT_TRUE(OuterLoop.isAnnotatedParallel());
    Loop &InnerLoop = *OuterLoop.begin();
    ASSERT_TRUE(InnerLoop.isAnnotatedParallel());
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200701/a5c5b6e4/attachment-0001.html>

David Greene via llvm-dev

2020-Jul-01 18:26 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

Michael Kruse via llvm-dev <llvm-dev at lists.llvm.org> writes:
> Unfortunately, most tests we have do not even use placeholders for
> metadata nodes, including those generated by update_test_checks.py. I
> looked into improving the script in this regard, but its
> implementation is function-centric, making it difficult to add
> module-level placeholders.
I actually have an implementation that does this for the update scripts.
It's on my queue to upstream.  In addition I have a number of
enhancements that allow the update scripts to generate more focused,
robust tests.  The first of these is D68230 which is currently stuck and
I'd appreciate some more eyes on it.
> As a result, I estimate that I had to invest about twice the time to
> update/fix tests than writing the code changes themselves. Due to my
> experience, I find updating FileCheck tests very frustrating.
> I'd prefer not to test whether specific metadata nodes are present,
> but whether the LLVM API to query them (such as `isAnnotatedParallel`)
> returns the expected result.
I think both could be useful.  With my update script enhancements I was
able to successfully match specific metadata nodes and check them
against metadata definitions at the Module level, though I'll admit my
use-case is probably simpler than yours.

                -David

David Greene via llvm-dev

2020-Jul-01 18:38 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

Michael Kruse via llvm-dev <llvm-dev at lists.llvm.org> writes:
> The reason is that MDNodes are emitted in topological order of a
> depth-first search. The patch changed the order of ACCESS_GROUP_INNER and
> ACCESS_GROUP_OUTER in line 41 but because in the mitted IR file, the first
> argument of !3 is emitted before the second argument, ACCESS_GROUP_OUTER is
> on line 42 and ACCESS_GROUP_INNER on line 43 instead.
I have run into this problem as well and my solution was to use
CHECK-DAG.  That might not work in all cases though.

There's another fragility problem with the test.  It assumes that the
!llvm.loop metadata is the only thing printed on the branch
instructions.  That might not be true forever.  The !llvm.access.group
checks work around this with {{.*}} to snarf up other metadata but a
more robust solution is probably CHECK-SAME.  CHECK-SAME is pretty much
required if you want to match more than one kind of metadata node on a
line while being robust in the face of future changes.

These are all issues I ran into when enhancing the update scripts to
handle metadata.

Michael's proposal definitely has merit.  More semantically-precise
tests are more robust.  But there's also a lot of benefit in enhancing
the existing test update scripts.

                 -David

Michael Kruse via llvm-dev

2020-Jul-02 05:35 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

Am Mi., 1. Juli 2020 um 13:26 Uhr schrieb David Greene <david.greene at
hpe.com>:>
> Michael Kruse via llvm-dev <llvm-dev at lists.llvm.org> writes:
>
> > Unfortunately, most tests we have do not even use placeholders for
> > metadata nodes, including those generated by update_test_checks.py. I
> > looked into improving the script in this regard, but its
> > implementation is function-centric, making it difficult to add
> > module-level placeholders.
>
> I actually have an implementation that does this for the update scripts.
> It's on my queue to upstream.  In addition I have a number of
> enhancements that allow the update scripts to generate more focused,
> robust tests.  The first of these is D68230 which is currently stuck and
> I'd appreciate some more eyes on it.
I'd appreciate it if this was added. However, I think that generating
tests only alleviates the symptoms of a deeper problem.

Regarding D68230, the problem seems to be that there are currently no
tests in upstream clang that would use it. I don't see how I could
help there.

> > As a result, I estimate that I had to invest about twice the time to
> > update/fix tests than writing the code changes themselves. Due to my
> > experience, I find updating FileCheck tests very frustrating.
> > I'd prefer not to test whether specific metadata nodes are
present,
> > but whether the LLVM API to query them (such as `isAnnotatedParallel`)
> > returns the expected result.
>
> I think both could be useful.  With my update script enhancements I was
> able to successfully match specific metadata nodes and check them
> against metadata definitions at the Module level, though I'll admit my
> use-case is probably simpler than yours.
I also think FileCheck can be the right tool for many cases. For
instance, checking the result of InstCombine where there is no
variation in the output. As soon eas regex placeholders are added, I
get sceptical.

Michael

llvm dev - Jul 2020 - [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.