thr3ads.net - llvm dev - [llvm-dev] [RFC] Compiled regression tests. [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Robinson, Paul via llvm-dev

2020-Jul-01 15:18 UTC

[llvm-dev] [RFC] Compiled regression tests.

The test as written is fragile because it requires a certain ordering.  If the
output order is not important, use CHECK-DAG rather than CHECK.  This would be a
failure to understand the testing tool.

My experience, over a 40-year career, is that good software developers are
generally not very good test-writers.  These are different skills and good
testing is frequently not taught.  It’s easy to write fragile tests; you make
your patch, you see what comes out, and you write the test to expect exactly
that output, using the minimum possible features of the testing tool.  This is
poor technique all around.  We even have scripts that automatically generate
such tests, used primarily in codegen tests.  I devoutly hope that the people
who produce those tests responsibly eyeball all those cases.

The proposal appears to be to migrate output-based tests (using
ever-more-complicated FileCheck features) to executable tests, which makes it
more like the software development people are used to instead of test-writing. 
But I don’t see that necessarily solving the problem; seems like it would be
just as easy to write a program that doesn’t do the right thing as to write a
FileCheck test that doesn’t do the right thing.

Hal’s suggestion is more to the point:  If the output we’re generating is not
appropriate to the kinds of tests we want to perform, it can be worthwhile to
generate different kinds of output.  MIR is a case in point; for a long time it
was hard to introspect into the interval between IR and final machine code, but
now it’s a lot easier.
--paulr

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Michael
Kruse via llvm-dev
Sent: Wednesday, July 1, 2020 1:40 AM
To: Michael Kruse <llvmdev at meinersbur.de>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] Compiled regression tests.

To illustrate some difficulties with FileCheck, lets make a non-semantic change
in LLVM:

    --- a/llvm/lib/Analysis/VectorUtils.cpp
    +++ b/llvm/lib/Analysis/VectorUtils.cpp
    @@ -642,8 +642,8 @@ MDNode *llvm::uniteAccessGroups(MDNode *AccGroups1,
MDNode *AccGroups2) {
         return AccGroups1;

       SmallSetVector<Metadata *, 4> Union;
    -  addToAccessGroupList(Union, AccGroups1);
       addToAccessGroupList(Union, AccGroups2);
    +  addToAccessGroupList(Union, AccGroups1);

       if (Union.size() == 0)
         return nullptr;

This changes the order of access groups when merging them.
Same background: Access groups are used to mark that a side-effect (e.g. memory
access) does not limit the parallelization of a loop, added by e.g. #pragma
clang loop vectorize(assume_safety). Therefore a loop can be assumed to be
parallelizable without further dependency analysis if all its side-effect
instructions are marked this way (and has no loop-carried scalar dependencies).
An access group represents the instructions marked for a specific loop. A single
instruction can be parallel with regrads to multiple loop, i.e. belong to
multiple access groups. The order of the access groups is insignificant, i.e.
whether either `AccGroups1` or `AccGroups2` comes first, the instruction is
marked parallel for both loops.

Even the change should have no effect on correctness, `ninja check-llvm` fails:

    test/Transforms/Inline/parallel-loop-md-merge.ll

     string not found in input
    ; CHECK: ![[ACCESSES_INNER]] = !{!"llvm.loop.parallel_accesses",
![[ACCESS_GROUP_INNER]]}
             ^
    <stdin>:45:1: note: scanning from here
    !7 = !{!"llvm.loop.parallel_accesses", !5}
    ^
    <stdin>:45:1: note: with "ACCESSES_INNER" equal to
"7"
    !7 = !{!"llvm.loop.parallel_accesses", !5}
    ^
    <stdin>:45:1: note: with "ACCESS_GROUP_INNER" equal to
"4"
    !7 = !{!"llvm.loop.parallel_accesses", !5}
    ^

The line FileCheck points to has nothing to do with the changed order of access
groups.
(what's the point of annotating the `!7 = ...` line repeatedly? Why not
pointing to the lines where ACCESSES_INNER/ACCESS_GROUP_INNER have their values
from?)

The CHECK lines, annotated with their line numbers from
parallel-loop-md-merge.ll, are:

    68 ; CHECK: store double 4.200000e+01, {{.*}} !llvm.access.group
![[ACCESS_GROUP_LIST_3:[0-9]+]]
    69 ; CHECK: br label %for.cond.i, !llvm.loop ![[LOOP_INNER:[0-9]+]]
    70 ; CHECK: br label %for.cond, !llvm.loop ![[LOOP_OUTER:[0-9]+]]
    71
    72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] = !{![[ACCESS_GROUP_INNER:[0-9]+]],
![[ACCESS_GROUP_OUTER:[0-9]+]]}
    73 ; CHECK: ![[ACCESS_GROUP_INNER]] = distinct !{}
    74 ; CHECK: ![[ACCESS_GROUP_OUTER]] = distinct !{}
    75 ; CHECK: ![[LOOP_INNER]] = distinct !{![[LOOP_INNER]],
![[ACCESSES_INNER:[0-9]+]]}
    76 ; CHECK: ![[ACCESSES_INNER]] =
!{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_INNER]]}
    77 ; CHECK: ![[LOOP_OUTER]] = distinct !{![[LOOP_OUTER]],
![[ACCESSES_OUTER:[0-9]+]]}
    78 ; CHECK: ![[ACCESSES_OUTER]] =
!{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_OUTER]]}

And the output of `FileCheck --dump-input -v` is:

         38: !0 = !{!1}
         39: !1 = distinct !{!1, !2, !"callee: %A"}
         40: !2 = distinct !{!2, !"callee"}
         41: !3 = !{!4, !5}
check:72     ^~~~~~~~~~~~~~
         42: !4 = distinct !{}
check:73     ^~~~~~~~~~~~~~~~~
         43: !5 = distinct !{}
check:74     ^~~~~~~~~~~~~~~~~
         44: !6 = distinct !{!6, !7}
check:75     ^~~~~~~~~~~~~~~~~~~~~~~
         45: !7 = !{!"llvm.loop.parallel_accesses", !5}
check:76     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
         46: !8 = distinct !{!8, !9}
check:76     ~~~~~~~~~~~~~~~~~~~~~~~
         47: !9 = !{!"llvm.loop.parallel_accesses", !4}
check:76     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To fix this regression test, one needs to solve the following questions:

  *    Is `!0 = !{!1}` supposed to match with anything?
  *    Which of the CHECK lines are the ones that are important for the
regression tests?
  *    The `![[SOMETHING]] = distinct !{}` line appears twice. Which one is
ACCESS_GROUP_INNER/ACCESS_GROUP_OUTER?
  *    There are two lines with `llvm.loop.parallel_accesses` as well. The
author of this regression test was kind enough to give the placeholders
descriptive names, but do they match with what FileCheck matched?
  *    LOOP_INNER and LOOP_OUTER are only distinguished by position and `.i`.
Why does the inner loop come first?
  *    Before the patch that modifies the order, the output of
--dump-input=always was:
     38: !0 = !{!1}
     39: !1 = distinct !{!1, !2, !"callee: %A"}
     40: !2 = distinct !{!2, !"callee"}
     41: !3 = !{!4, !5}
     42: !4 = distinct !{}
     43: !5 = distinct !{}
     44: !6 = distinct !{!6, !7}
     45: !7 = !{!"llvm.loop.parallel_accesses", !4}
     46: !8 = distinct !{!8, !9}
     47: !9 = !{!"llvm.loop.parallel_accesses", !5}

This seems to suggest that our change caused !4 and !5 in lines 45 and 47 to
swap. This is not what the patch did, which changes to order of two MDNode
arguments. The only MDNode that has two other MDNodes as operands is line (line
!6 and !8 are representing loops as they reference themselves as first
arguments; noalias scopes do this as well, but there is no noalias metadata in
this example).

An hasty fix is to change CHECK lines 76 and 77 to

    76 ; CHECK: ![[ACCESSES_INNER]] =
!{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_OUTER]]}
    78 ; CHECK: ![[ACCESSES_OUTER]] =
!{!"llvm.loop.parallel_accesses", ![[ACCESS_GROUP_INNER]]}

However, ACCESS_GROUP_OUTER belongs to ACCESSES_INNER and ACCESS_GROUP_INNER to
ACCESSES_OUTER? This cannot be true.


The reason is that MDNodes are emitted in topological order of a depth-first
search. The patch changed the order of ACCESS_GROUP_INNER and ACCESS_GROUP_OUTER
in line 41 but because in the mitted IR file, the first argument of !3 is
emitted before the second argument, ACCESS_GROUP_OUTER is on line 42 and
ACCESS_GROUP_INNER on line 43 instead.

Even though this regression test uses proper regex-ing of metadata node numbers,
it is still fragile as it fails on irrelevant changes. Once it fails, it
requires time to fix properly because it is non-obvious which lines are intended
to match which output lines. The hasty fix would have mixed up
ACCESS_GROUP_OUTER/ACCESS_GROUP_INNER and I wouldn't expect someone who is
working on the inliner to take the time to understand all the loop metadata.
Set-semantics of metadata occurs often, such as noalias metadata, loop
properties, tbaa metadata, etc.

The difficulty gets amplified when there are multiple functions in the
regression tests; metadata of all functions all appended to the end and MDNodes
with same content uniqued, making it impossible to associate particular MDNodes
with specific functions.


Ideally the regression test would be robust and understandable, achievable with
two asserts in a unittest:

    Loop &OuterLoop = *LI->begin();
    ASSERT_TRUE(OuterLoop.isAnnotatedParallel());
    Loop &InnerLoop = *OuterLoop.begin();
    ASSERT_TRUE(InnerLoop.isAnnotatedParallel());
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200701/a5c5b6e4/attachment-0001.html>

Michael Kruse via llvm-dev

2020-Jul-01 16:33 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

Am Mi., 1. Juli 2020 um 10:18 Uhr schrieb Robinson, Paul <
paul.robinson at sony.com>:
> The test as written is fragile because it requires a certain ordering.  If
> the output order is not important, use CHECK-DAG rather than CHECK.  This
> would be a failure to understand the testing tool.
>
CHECK-DAG does not help here since what changes is within a list on the
same line, and we have no CHECK-SAME-DAG or CHECK-DAG-SAME. Even if we had
it, the actual line that changed is textually the same and FileCheck would
need to backtrack deep into the following lines for alternative placeholder
substitutions. It would look like

CHECK-SAME-DAG: ![[ACCESS_GROUP_INNER:[0-9]+]]
CHECK-SAME-DAG: ,
CHECK-SAME-DAG: ![[ACCESS_GROUP_OUTER:[0-9]+]]

which allows the comma to appear anywhere and I don't find readable.

My (naive?) conclusion is that textural checking is not the right tool here.

> My experience, over a 40-year career, is that good software developers are
> generally not very good test-writers.  These are different skills and good
> testing is frequently not taught.  It’s easy to write fragile tests; you
> make your patch, you see what comes out, and you write the test to expect
> exactly that output, using the minimum possible features of the testing
> tool.  This is poor technique all around.  We even have scripts that
> automatically generate such tests, used primarily in codegen tests.  I
> devoutly hope that the people who produce those tests responsibly eyeball
> all those cases.
>
>
>
> The proposal appears to be to migrate output-based tests (using
> ever-more-complicated FileCheck features) to executable tests, which makes
> it more like the software development people are used to instead of
> test-writing.  But I don’t see that necessarily solving the problem; seems
> like it would be just as easy to write a program that doesn’t do the right
> thing as to write a FileCheck test that doesn’t do the right thing.
>
IMHO having a tool that allows to better express what is intended to be
tested is already worth a lot. For instance, we usually don't care about
SSA value names or MDNode numbers, but we have to put extra work into
regex-ing away those names in FileCheck tests and as a result, most tests
we have do still expect the exact number for metadata nodes. This is a
problem if we we want to emit new metadata nodes in that all those tests
need to be updated.

This problem goes away if the test method by default ignored value
names/MDNode numbers and software development people had to put extra work
if they actually want to verify this.


>
> Hal’s suggestion is more to the point:  If the output we’re generating is
> not appropriate to the kinds of tests we want to perform, it can be
> worthwhile to generate different kinds of output.  MIR is a case in point;
> for a long time it was hard to introspect into the interval between IR and
> final machine code, but now it’s a lot easier.
>
Can you elaborate on what makes it easier?


Michael

>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200701/f1ae18f1/attachment.html>

Robinson, Paul via llvm-dev

2020-Jul-01 17:24 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

What I actually meant re. CHECK-DAG is to take this

    72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] = !{![[ACCESS_GROUP_INNER:[0-9]+]],
![[ACCESS_GROUP_OUTER:[0-9]+]]}
    73 ; CHECK: ![[ACCESS_GROUP_INNER]] = distinct !{}
    74 ; CHECK: ![[ACCESS_GROUP_OUTER]] = distinct !{}

and turn it into this

    72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] = !{![[ACCESS_GROUP_INNER:[0-9]+]],
![[ACCESS_GROUP_OUTER:[0-9]+]]}
    73 ; CHECK-DAG: ![[ACCESS_GROUP_INNER]] = distinct !{}
    74 ; CHECK-DAG: ![[ACCESS_GROUP_OUTER]] = distinct !{}

except that I think I was misled by the “non-semantic” remark about the change. 
Did you mean that the order of the INNER and OUTER elements (line 72) has been
swapped?  That sounds semantic, as far as the structure of the metadata is
concerned!  But okay, let’s call that a syntactic change, and a test relying on
the order of the parameters will break.  Which it did.  And the correct fix is
instead

    72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] = !{![[ACCESS_GROUP_OUTER:[0-9]+]],
![[ACCESS_GROUP_INNER:[0-9]+]]}

is it not?  To reflect the change in order?

But let’s say I’m the one doing this presumably innocuous change, and have no
clue what I’m doing, and don’t know much about how FileCheck works (which is
pretty typical of the community, I admit).  You’ve shown issues with trying to
diagnose the FileCheck results.

How would the compiled regression test fail?  How would it be easier to identify
and repair the issue?


Re. how MIR makes testing easier: it is a serialization of the data that a
machine-IR pass operates on, which makes it feasible to feed canned MIR through
a single pass in llc and look at what exactly that pass did.  Prior to MIR, we
had to go from IR to real machine code and infer what was going on in a pass
after multiple levels of transformation had occurred.  It was very black-box,
and the black box was way too big.


Anyway, getting back to the original topic, it sounds like you’re proposing two
rather independent things: (1) an easier API for writing optimizer unittests,
(2) a way to push unittests into the main llvm/test tree.  Can we separate these
things?
Thanks,
--paulr


From: Michael Kruse <llvmdev at meinersbur.de>
Sent: Wednesday, July 1, 2020 12:33 PM
To: Robinson, Paul <paul.robinson at sony.com>
Cc: Michael Kruse <llvmdev at meinersbur.de>; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [RFC] Compiled regression tests.

Am Mi., 1. Juli 2020 um 10:18 Uhr schrieb Robinson, Paul <paul.robinson at
sony.com<mailto:paul.robinson at sony.com>>:
The test as written is fragile because it requires a certain ordering.  If the
output order is not important, use CHECK-DAG rather than CHECK.  This would be a
failure to understand the testing tool.

CHECK-DAG does not help here since what changes is within a list on the same
line, and we have no CHECK-SAME-DAG or CHECK-DAG-SAME. Even if we had it, the
actual line that changed is textually the same and FileCheck would need to
backtrack deep into the following lines for alternative placeholder
substitutions. It would look like

CHECK-SAME-DAG: ![[ACCESS_GROUP_INNER:[0-9]+]]
CHECK-SAME-DAG: ,
CHECK-SAME-DAG: ![[ACCESS_GROUP_OUTER:[0-9]+]]

which allows the comma to appear anywhere and I don't find readable.

My (naive?) conclusion is that textural checking is not the right tool here.


My experience, over a 40-year career, is that good software developers are
generally not very good test-writers.  These are different skills and good
testing is frequently not taught.  It’s easy to write fragile tests; you make
your patch, you see what comes out, and you write the test to expect exactly
that output, using the minimum possible features of the testing tool.  This is
poor technique all around.  We even have scripts that automatically generate
such tests, used primarily in codegen tests.  I devoutly hope that the people
who produce those tests responsibly eyeball all those cases.

The proposal appears to be to migrate output-based tests (using
ever-more-complicated FileCheck features) to executable tests, which makes it
more like the software development people are used to instead of test-writing. 
But I don’t see that necessarily solving the problem; seems like it would be
just as easy to write a program that doesn’t do the right thing as to write a
FileCheck test that doesn’t do the right thing.

IMHO having a tool that allows to better express what is intended to be tested
is already worth a lot. For instance, we usually don't care about SSA value
names or MDNode numbers, but we have to put extra work into regex-ing away those
names in FileCheck tests and as a result, most tests we have do still expect the
exact number for metadata nodes. This is a problem if we we want to emit new
metadata nodes in that all those tests need to be updated.

This problem goes away if the test method by default ignored value names/MDNode
numbers and software development people had to put extra work if they actually
want to verify this.



Hal’s suggestion is more to the point:  If the output we’re generating is not
appropriate to the kinds of tests we want to perform, it can be worthwhile to
generate different kinds of output.  MIR is a case in point; for a long time it
was hard to introspect into the interval between IR and final machine code, but
now it’s a lot easier.

Can you elaborate on what makes it easier?


Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200701/6e2c5014/attachment-0001.html>

David Greene via llvm-dev

2020-Jul-01 18:48 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

"Robinson, Paul via llvm-dev" <llvm-dev at lists.llvm.org>
writes:
> We even have scripts that automatically generate such tests, used
> primarily in codegen tests.  I devoutly hope that the people who
> produce those tests responsibly eyeball all those cases.
I don't think people do because they are so noisy.  Solving the problem
was the primary motivation of my pile of test update script
enhancements.
> Hal’s suggestion is more to the point: If the output we’re generating
> is not appropriate to the kinds of tests we want to perform, it can be
> worthwhile to generate different kinds of output.  MIR is a case in
> point; for a long time it was hard to introspect into the interval
> between IR and final machine code, but now it’s a lot easier.
+1.  I don't believe there's a silver bullet here.  Different tools are
appropriate for different kinds of tests.  I happen to think Michael's
proposal is another useful tool in the box, as is Hal's.

             -David

David Greene via llvm-dev

2020-Jul-01 18:58 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

Michael Kruse via llvm-dev <llvm-dev at lists.llvm.org> writes:
> Am Mi., 1. Juli 2020 um 10:18 Uhr schrieb Robinson, Paul <
> paul.robinson at sony.com>:
>
>> The test as written is fragile because it requires a certain ordering. 
If
>> the output order is not important, use CHECK-DAG rather than CHECK. 
This
>> would be a failure to understand the testing tool.
>>
>
> CHECK-DAG does not help here since what changes is within a list on the
> same line, and we have no CHECK-SAME-DAG or CHECK-DAG-SAME. Even if we had
> it, the actual line that changed is textually the same and FileCheck would
> need to backtrack deep into the following lines for alternative placeholder
> substitutions. It would look like
>
> CHECK-SAME-DAG: ![[ACCESS_GROUP_INNER:[0-9]+]]
> CHECK-SAME-DAG: ,
> CHECK-SAME-DAG: ![[ACCESS_GROUP_OUTER:[0-9]+]]
Would this not work?

CHECK-SAME: ![[ACCESS_GROUP_INNER:[0-9]+]]
CHECK-SAME: ![[ACCESS_GROUP_OUTER:[0-9]+]]

I don't think CHECK-SAME is sensitive to order within the line.  This
works for me in my metadata tests but maybe I've just been lucky.
> IMHO having a tool that allows to better express what is intended to be
> tested is already worth a lot.
Exactly.
> For instance, we usually don't care about SSA value names or MDNode
> numbers, but we have to put extra work into regex-ing away those names
> in FileCheck tests and as a result, most tests we have do still expect
> the exact number for metadata nodes. This is a problem if we we want
> to emit new metadata nodes in that all those tests need to be updated.
IME, many of our tests check way too much.  Codegen tests are
particularly egregious in this regard because many are (re-)generated
and match whole functions of asm.

One of the major problems with using FileCheck regexps in generated
tests is that it's very difficult to take an update_*_test_checks.py
output and whittle it down because the FileCheck regexps define and use
variables which need to be massaged while altering the test.  I have
some patches that help alleviate this.
> This problem goes away if the test method by default ignored value
> names/MDNode numbers and software development people had to put extra
> work if they actually want to verify this.
If the test method is appropriate for the test, sure.  But I suspect
we'll have FileCheck tests for a good long while and also making that
experience better is well worth the effort.

                  -David

Hal Finkel via llvm-dev

2020-Jul-01 19:16 UTC

head link

[llvm-dev] [RFC] Compiled regression tests.

On 7/1/20 1:48 PM, David Greene via llvm-dev wrote:> "Robinson, Paul via llvm-dev" <llvm-dev at lists.llvm.org>
writes:
>
>> We even have scripts that automatically generate such tests, used
>> primarily in codegen tests.  I devoutly hope that the people who
>> produce those tests responsibly eyeball all those cases.
> I don't think people do because they are so noisy.  Solving the problem
> was the primary motivation of my pile of test update script
> enhancements.
>
>> Hal’s suggestion is more to the point: If the output we’re generating
>> is not appropriate to the kinds of tests we want to perform, it can be
>> worthwhile to generate different kinds of output.  MIR is a case in
>> point; for a long time it was hard to introspect into the interval
>> between IR and final machine code, but now it’s a lot easier.
> +1.  I don't believe there's a silver bullet here.  Different tools
are
> appropriate for different kinds of tests.  I happen to think Michael's
> proposal is another useful tool in the box, as is Hal's.

To follow up on this, I also think that Michael's proposal is a 
potentially-useful tool in our testing toolbox. Moreover, his motivation 
for proposing this points out a real issue that we have with our current 
set of tests.

Hindsight is 20/20, but the hardship associated with Michael's work 
updating the loop metadata tests probably would have been best addressed 
by having a good analysis output and checking that. It seems like a 
single bit of C++ code could, thus, be shared across many tests. It 
seems to me that the use case Michael's proposal addresses is where we 
have a large number of test cases, each requiring complex verification 
logic, and logic not easily phrased in terms of checking a generic 
textual analysis summary. I'm not yet sure this particular use case exists.

His proposal also aims to do something else: make certain kinds of 
checks less fragile. As he points out, many useful concepts are easily 
expressed as CHECK-NOT, but because there's no binding between the text 
string in the check line and the code that might produce that text, it's 
easy for these to fall out of sync. His proposal, using C++ tests, 
provides better type checking in this regard. There may be other ways to 
address this concern (e.g., using FileCheck variables of some kind of 
positively match string name fragments before their use in CHECK-NOTs).

  -Hal

>
>               -David
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

llvm dev - Jul 2020 - [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.

[llvm-dev] [RFC] Compiled regression tests.