thr3ads.net - llvm dev - [llvm-dev] RFC: End-to-end testing [Oct 2019]

If this information is useful, please help other people find it:
Share via:

David Greene via llvm-dev

2019-Oct-08 16:49 UTC

[llvm-dev] RFC: End-to-end testing

[ I am initially copying only a few lists since they seem like
  the most impacted projects and I didn't want to spam all the mailing
  lists.  Please let me know if other lists should be included. ]

I submitted D68230 for review but this is not about that patch per se.
The patch allows update_cc_test_checks.py to process tests that should
check target asm rather than LLVM IR.  We use this facility downstream
for our end-to-end tests.  It strikes me that it might be useful for
upstream to do similar end-to-end testing.

Now that the monorepo is about to become the canonical source of truth,
we have an opportunity for convenient end-to-end testing that we didn't
easily have before with svn (yes, it could be done but in an ugly way).
AFAIK the only upstream end-to-end testing we have is in test-suite and
many of those codes are very large and/or unfocused tests.

With the monorepo we have a place to put lit-style tests that exercise
multiple subprojects, for example tests that ensure the entire clang
compilation pipeline executes correctly.  We could, for example, create
a top-level "test" directory and put end-to-end tests there.  Some of
the things that could be tested include:

- Pipeline execution (debug-pass=Executions)
- Optimization warnings/messages
- Specific asm code sequences out of clang (e.g. ensure certain loops
  are vectorized)
- Pragma effects (e.g. ensure loop optimizations are honored)
- Complete end-to-end PGO (generate a profile and re-compile)
- GPU/accelerator offloading
- Debuggability of clang-generated code

Each of these things is tested to some degree within their own
subprojects, but AFAIK there are currently no dedicated tests ensuring
such things work through the entire clang pipeline flow and with other
tools that make use of the results (debuggers, etc.).  It is relatively
easy to break the pipeline while the individual subproject tests
continue to pass.

I realize that some folks prefer to work on only a portion of the
monorepo (for example, they just hack on LLVM).  I am not sure how to
address those developers WRT end-to-end testing.  On the one hand,
requiring them to run end-to-end testing means they will have to at
least check out and build the monorepo.  On the other hand, it seems
less than ideal to have people developing core infrastructure and not
running tests.

I don't yet have a formal proposal but wanted to put this out to spur
discussion and gather feedback and ideas.  Thank you for your interest
and participation!

                        -David

via llvm-dev

2019-Oct-08 18:51 UTC

head link

[llvm-dev] RFC: End-to-end testing

> -----Original Message-----
> From: cfe-dev <cfe-dev-bounces at lists.llvm.org> On Behalf Of David
Greene
> via cfe-dev
> Sent: Tuesday, October 08, 2019 12:50 PM
> To: llvm-dev at lists.llvm.org; cfe-dev at lists.llvm.org; openmp-
> dev at lists.llvm.org; lldb-dev at lists.llvm.org
> Subject: [cfe-dev] RFC: End-to-end testing
> 
> [ I am initially copying only a few lists since they seem like
>   the most impacted projects and I didn't want to spam all the mailing
>   lists.  Please let me know if other lists should be included. ]
> 
> I submitted D68230 for review but this is not about that patch per se.
> The patch allows update_cc_test_checks.py to process tests that should
> check target asm rather than LLVM IR.  We use this facility downstream
> for our end-to-end tests.  It strikes me that it might be useful for
> upstream to do similar end-to-end testing.
> 
> Now that the monorepo is about to become the canonical source of truth,
> we have an opportunity for convenient end-to-end testing that we didn't
> easily have before with svn (yes, it could be done but in an ugly way).
> AFAIK the only upstream end-to-end testing we have is in test-suite and
> many of those codes are very large and/or unfocused tests.
> 
> With the monorepo we have a place to put lit-style tests that exercise
> multiple subprojects, for example tests that ensure the entire clang
> compilation pipeline executes correctly.  We could, for example, create
> a top-level "test" directory and put end-to-end tests there. 
Some of
> the things that could be tested include:
> 
> - Pipeline execution (debug-pass=Executions)
> - Optimization warnings/messages
> - Specific asm code sequences out of clang (e.g. ensure certain loops
>   are vectorized)
> - Pragma effects (e.g. ensure loop optimizations are honored)
> - Complete end-to-end PGO (generate a profile and re-compile)
> - GPU/accelerator offloading
> - Debuggability of clang-generated code
> 
> Each of these things is tested to some degree within their own
> subprojects, but AFAIK there are currently no dedicated tests ensuring
> such things work through the entire clang pipeline flow and with other
> tools that make use of the results (debuggers, etc.).  It is relatively
> easy to break the pipeline while the individual subproject tests
> continue to pass.
I agree with all your points.  End-to-end testing is a major hole in the
project infrastructure; it has been largely left up to the individual
vendors/packagers/distributors.  The Clang tests verify that Clang will
produce some sort of not-unreasonable IR for given situations; the LLVM
tests verify that some (other) set of input IR will produce something
that looks not-unreasonable on the target side.  Very little connects
the two.

There is more than nothing:
- test-suite has some quantity of code that is compiled end-to-end for
  some targets.
- lldb can be set up to use the just-built Clang to compile its tests,
  but those are focused on debug info and are nothing like comprehensive.
- libcxx likely also can use the just-built Clang, although I've never
  tried it so I don't know for sure. It obviously exercises just the
  runtime side of things.
- compiler-rt likewise. The sanitizer tests in particular I'd expect to
  be using the just-built Clang.
- debuginfo-tests also uses the just-built Clang but is a pathetically
  small set, and again focused on debug info.

I'm not saying the LLVM Project should invest in a commercial suite
(although I'd expect vendors to do so; Sony does).  But a place to *put*
end-to-end tests seems entirely reasonable and useful.  Although I would
resist calling it simply "tests" (we have too many directories with
that
name already).
> 
> I realize that some folks prefer to work on only a portion of the
> monorepo (for example, they just hack on LLVM).  I am not sure how to
> address those developers WRT end-to-end testing.  On the one hand,
> requiring them to run end-to-end testing means they will have to at
> least check out and build the monorepo.  On the other hand, it seems
> less than ideal to have people developing core infrastructure and not
> running tests.
People should obviously be running the tests for the project(s) they're
modifying.  People aren't expected to run everything.  That's why...

Bots.  "Don't argue with the bots."  I don't checkout and
build and test
everything, and I've broken LLDB, compiler-rt, and probably others from
time to time.  Probably everybody has broken other projects unexpectedly.
That's what bots are for, to run the combinations and the projects that 
I don't have the infrastructure or resources to do myself.  It's not up
to me to run everything possible before committing; it IS up to me to
respond promptly to bot failures for my changes.  I don't see a new
end-to-end test project being any different in that respect.
> 
> I don't yet have a formal proposal but wanted to put this out to spur
> discussion and gather feedback and ideas.  Thank you for your interest
> and participation!
Thanks for bringing it up!  It's been a pebble in my shoe for a long time.
--paulr
> 
>                         -David
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

David Blaikie via llvm-dev

2019-Oct-08 19:22 UTC

head link

[llvm-dev] [cfe-dev] RFC: End-to-end testing

I have a bit of concern about this sort of thing - worrying it'll lead to
people being less cautious about writing the more isolated tests. That
said, clearly there's value in end-to-end testing for all the reasons
you've mentioned (& we do see these problems in practice - recently
DWARF
indexing broke when support for more nuanced language codes were added to
Clang).

Dunno if they need a new place or should just be more stuff in test-suite,
though.

On Tue, Oct 8, 2019 at 9:50 AM David Greene via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> [ I am initially copying only a few lists since they seem like
>   the most impacted projects and I didn't want to spam all the mailing
>   lists.  Please let me know if other lists should be included. ]
>
> I submitted D68230 for review but this is not about that patch per se.
> The patch allows update_cc_test_checks.py to process tests that should
> check target asm rather than LLVM IR.  We use this facility downstream
> for our end-to-end tests.  It strikes me that it might be useful for
> upstream to do similar end-to-end testing.
>
> Now that the monorepo is about to become the canonical source of truth,
> we have an opportunity for convenient end-to-end testing that we didn't
> easily have before with svn (yes, it could be done but in an ugly way).
> AFAIK the only upstream end-to-end testing we have is in test-suite and
> many of those codes are very large and/or unfocused tests.
>
> With the monorepo we have a place to put lit-style tests that exercise
> multiple subprojects, for example tests that ensure the entire clang
> compilation pipeline executes correctly.  We could, for example, create
> a top-level "test" directory and put end-to-end tests there. 
Some of
> the things that could be tested include:
>
> - Pipeline execution (debug-pass=Executions)
> - Optimization warnings/messages
> - Specific asm code sequences out of clang (e.g. ensure certain loops
>   are vectorized)
> - Pragma effects (e.g. ensure loop optimizations are honored)
> - Complete end-to-end PGO (generate a profile and re-compile)
> - GPU/accelerator offloading
> - Debuggability of clang-generated code
>
> Each of these things is tested to some degree within their own
> subprojects, but AFAIK there are currently no dedicated tests ensuring
> such things work through the entire clang pipeline flow and with other
> tools that make use of the results (debuggers, etc.).  It is relatively
> easy to break the pipeline while the individual subproject tests
> continue to pass.
>
> I realize that some folks prefer to work on only a portion of the
> monorepo (for example, they just hack on LLVM).  I am not sure how to
> address those developers WRT end-to-end testing.  On the one hand,
> requiring them to run end-to-end testing means they will have to at
> least check out and build the monorepo.  On the other hand, it seems
> less than ideal to have people developing core infrastructure and not
> running tests.
>
> I don't yet have a formal proposal but wanted to put this out to spur
> discussion and gather feedback and ideas.  Thank you for your interest
> and participation!
>
>                         -David
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191008/f4a36da1/attachment.html>

David Greene via llvm-dev

2019-Oct-08 19:46 UTC

head link

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: End-to-end testing

David Blaikie via Openmp-dev <openmp-dev at lists.llvm.org> writes:
> I have a bit of concern about this sort of thing - worrying it'll lead
to
> people being less cautious about writing the more isolated tests.
That's a fair concern.  Reviewers will still need to insist on small
component-level tests to go along with patches.  We don't have to
sacrifice one to get the other.
> Dunno if they need a new place or should just be more stuff in test-suite,
> though.
There are at least two problems I see with using test-suite for this:

- It is a separate repository and thus is not as convenient as tests
  that live with the code.  One cannot commit an end-to-end test
  atomically with the change meant to be tested.

- It is full of large codes which is not the kind of testing I'm talking
  about.

Let me describe how I recently added some testing in our downstream
fork.

- I implemented a new feature along with a C source test.

- I used clang to generate asm from that test and captured the small
  piece of it I wanted to check in an end-to-end test.

- I used clang to generate IR just before the feature kicked in and
  created an opt-style test for it.  Generating this IR is not always
  straightfoward and it would be great to have better tools to do this,
  but that's another discussion.

- I took the IR out of opt (after running my feature) and created an
  llc-style test out of it to check the generated asm.  The checks are
  the same as in the original C end-to-end test.

So the tests are checking at each stage that the expected input is
generating the expected output and the end-to-end test checks that we go
from source to asm correctly.

These are all really small tests, easily runnable as part of the normal
"make check" process.

                     -David

Mehdi AMINI via llvm-dev

2019-Oct-09 02:14 UTC

head link

[llvm-dev] [cfe-dev] RFC: End-to-end testing

> I have a bit of concern about this sort of thing - worrying it'll lead
to
> people being less cautious about writing the more isolated tests.
>
I have the same concern. I really believe we need to be careful about
testing at the right granularity to keep things both modular and the
testing maintainable (for instance checking vectorized ASM from a C++
source through clang has always been considered a bad FileCheck practice).
(Not saying that there is no space for better integration testing in some
areas).

> That said, clearly there's value in end-to-end testing for all the
reasons
> you've mentioned (& we do see these problems in practice - recently
DWARF
> indexing broke when support for more nuanced language codes were added to
> Clang).
>
> Dunno if they need a new place or should just be more stuff in test-suite,
> though.
>
> On Tue, Oct 8, 2019 at 9:50 AM David Greene via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> [ I am initially copying only a few lists since they seem like
>>   the most impacted projects and I didn't want to spam all the
mailing
>>   lists.  Please let me know if other lists should be included. ]
>>
>> I submitted D68230 for review but this is not about that patch per se.
>> The patch allows update_cc_test_checks.py to process tests that should
>> check target asm rather than LLVM IR.  We use this facility downstream
>> for our end-to-end tests.  It strikes me that it might be useful for
>> upstream to do similar end-to-end testing.
>>
>> Now that the monorepo is about to become the canonical source of truth,
>> we have an opportunity for convenient end-to-end testing that we
didn't
>> easily have before with svn (yes, it could be done but in an ugly way).
>> AFAIK the only upstream end-to-end testing we have is in test-suite and
>> many of those codes are very large and/or unfocused tests.
>>
>> With the monorepo we have a place to put lit-style tests that exercise
>> multiple subprojects, for example tests that ensure the entire clang
>> compilation pipeline executes correctly.
>
>I don't think I agree with the relationship to the monorepo: there was
nothing that prevented tests inside the clang project to exercise the full
pipeline already. I don't believe that the SVN repo structure was really a
factor in the way the testing was setup, but instead it was a deliberate
choice in the way we structure our testing.
For instance I remember asking about implementing test based on checking if
some loops written in C source file were properly vectorized by the -O2 /
-O3 pipeline and it was deemed like the kind of test that we don't want to
maintain: instead I was pointed at the test-suite to add better benchmarks
there for the end-to-end story. What is interesting is that the test-suite
is not gonna be part of the monorepo!

To be clear: I'm not saying here we can't change our way of testing, I
just
don't think the monorepo has anything to do with it and that it should
carefully motivated and scoped into what belongs/doesn't belong to
integration tests.


> We could, for example, create
>> a top-level "test" directory and put end-to-end tests there. 
Some of
>> the things that could be tested include:
>>
>> - Pipeline execution (debug-pass=Executions)
>>
> - Optimization warnings/messages
>> - Specific asm code sequences out of clang (e.g. ensure certain loops
>>   are vectorized)
>> - Pragma effects (e.g. ensure loop optimizations are honored)
>> - Complete end-to-end PGO (generate a profile and re-compile)
>> - GPU/accelerator offloading
>> - Debuggability of clang-generated code
>>
>> Each of these things is tested to some degree within their own
>> subprojects, but AFAIK there are currently no dedicated tests ensuring
>> such things work through the entire clang pipeline flow and with other
>> tools that make use of the results (debuggers, etc.).  It is relatively
>> easy to break the pipeline while the individual subproject tests
>> continue to pass.
>>
>
I'm not sure I really see much in your list that isn't purely about
testing
clang itself here?
Actually the first one seems more of a pure LLVM test.

> I realize that some folks prefer to work on only a portion of the
>> monorepo (for example, they just hack on LLVM).  I am not sure how to
>> address those developers WRT end-to-end testing.  On the one hand,
>> requiring them to run end-to-end testing means they will have to at
>> least check out and build the monorepo.  On the other hand, it seems
>> less than ideal to have people developing core infrastructure and not
>> running tests.
>>
>I think we already expect LLVM developers to update clang APIs? And we
revert LLVM patches when clang testing is broken. So I believe the
acknowledgment to maintain the other in-tree projects isn't really new, it
is true that the monorepo will make this easy for everyone to reproduce
locally most failure, and find all the use of an API across projects (which
was provided as a motivation to move to a monorepo model:
https://llvm.org/docs/Proposals/GitHubMove.html#monorepo ).

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191008/55b4c71f/attachment.html>

Philip Reames via llvm-dev

2019-Oct-09 23:35 UTC

head link

[llvm-dev] RFC: End-to-end testing

On 10/8/19 9:49 AM, David Greene via llvm-dev wrote:> [ I am initially copying only a few lists since they seem like
>    the most impacted projects and I didn't want to spam all the mailing
>    lists.  Please let me know if other lists should be included. ]
>
> I submitted D68230 for review but this is not about that patch per se.
> The patch allows update_cc_test_checks.py to process tests that should
> check target asm rather than LLVM IR.  We use this facility downstream
> for our end-to-end tests.  It strikes me that it might be useful for
> upstream to do similar end-to-end testing.
>
> Now that the monorepo is about to become the canonical source of truth,
> we have an opportunity for convenient end-to-end testing that we didn't
> easily have before with svn (yes, it could be done but in an ugly way).
> AFAIK the only upstream end-to-end testing we have is in test-suite and
> many of those codes are very large and/or unfocused tests.
>
> With the monorepo we have a place to put lit-style tests that exercise
> multiple subprojects, for example tests that ensure the entire clang
> compilation pipeline executes correctly.  We could, for example, create
> a top-level "test" directory and put end-to-end tests there. 
Some of
> the things that could be tested include:
>
> - Pipeline execution (debug-pass=Executions)
> - Optimization warnings/messages
> - Specific asm code sequences out of clang (e.g. ensure certain loops
>    are vectorized)
> - Pragma effects (e.g. ensure loop optimizations are honored)
> - Complete end-to-end PGO (generate a profile and re-compile)
> - GPU/accelerator offloading
> - Debuggability of clang-generated code
>
> Each of these things is tested to some degree within their own
> subprojects, but AFAIK there are currently no dedicated tests ensuring
> such things work through the entire clang pipeline flow and with other
> tools that make use of the results (debuggers, etc.).  It is relatively
> easy to break the pipeline while the individual subproject tests
> continue to pass.
>
> I realize that some folks prefer to work on only a portion of the
> monorepo (for example, they just hack on LLVM).  I am not sure how to
> address those developers WRT end-to-end testing.  On the one hand,
> requiring them to run end-to-end testing means they will have to at
> least check out and build the monorepo.  On the other hand, it seems
> less than ideal to have people developing core infrastructure and not
> running tests.
>
> I don't yet have a formal proposal but wanted to put this out to spur
> discussion and gather feedback and ideas.  Thank you for your interest
> and participation!
The two major concerns I see are a potential decay in component test 
quality, and an increase in difficulty changing components. The former 
has already been discussed a bit downstream, so let me focus on the later.

A challenge we already have - as in, I've broken these tests and had to 
fix them - is that an end to end test which checks either IR or assembly 
ends up being extraordinarily fragile.  Completely unrelated profitable 
transforms create small differences which cause spurious test failures.  
This is a very real issue today with the few end-to-end clang tests we 
have, and I am extremely hesitant to expand those tests without giving 
this workflow problem serious thought.  If we don't, this could bring 
development on middle end transforms to a complete stop.  (Not kidding.)

A couple of approaches we could consider:

 1. Simply restrict end to end tests to crash/assert cases.  (i.e. no
    property of the generated code is checked, other than that it is
    generated)  This isn't as restrictive as it sounds when combined
    w/coverage guided fuzzer corpuses.
 2. Auto-update all diffs, but report them to a human user for
    inspection.  This ends up meaning that tests never "fail" per se,
    but that individuals who have expressed interest in particular tests
    get an automated notification and a chance to respond on list with a
    reduced example.
 3. As a variant on the former, don't auto-update tests, but only inform
    the *contributor* of an end-to-end test of a failure. Responsibility
    for determining failure vs false positive lies solely with them, and
    normal channels are used to report a failure after it has been
    confirmed/analyzed/explained.

I really think this is a problem we need to have thought through and 
found a workable solution before end-to-end testing as proposed becomes 
a practically workable option.

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191009/e9ccf276/attachment.html>

David Greene via llvm-dev

2019-Oct-10 01:25 UTC

head link

[llvm-dev] [cfe-dev] RFC: End-to-end testing

Philip Reames via cfe-dev <cfe-dev at lists.llvm.org> writes:
> A challenge we already have - as in, I've broken these tests and had to
> fix them - is that an end to end test which checks either IR or assembly 
> ends up being extraordinarily fragile.  Completely unrelated profitable 
> transforms create small differences which cause spurious test failures.  
> This is a very real issue today with the few end-to-end clang tests we 
> have, and I am extremely hesitant to expand those tests without giving 
> this workflow problem serious thought.  If we don't, this could bring 
> development on middle end transforms to a complete stop.  (Not kidding.)
Do you have a pointer to these tests?  We literally have tens of
thousands of end-to-end tests downstream and while some are fragile, the
vast majority are not.  A test that, for example, checks the entire
generated asm for a match is indeed very fragile.  A test that checks
whether a specific instruction/mnemonic was emitted is generally not, at
least in my experience.  End-to-end tests require some care in
construction.  I don't think update_llc_test_checks.py-type operation is
desirable.

Still, you raise a valid point and I think present some good options
below.
> A couple of approaches we could consider:
>
>  1. Simply restrict end to end tests to crash/assert cases.  (i.e. no
>     property of the generated code is checked, other than that it is
>     generated)  This isn't as restrictive as it sounds when combined
>     w/coverage guided fuzzer corpuses.
I would be pretty hesitant to do this but I'd like to hear more about
how you see this working with coverage/fuzzing.
>  2. Auto-update all diffs, but report them to a human user for
>     inspection.  This ends up meaning that tests never "fail" per
se,
>     but that individuals who have expressed interest in particular tests
>     get an automated notification and a chance to respond on list with a
>     reduced example.
That's certainly workable.
>  3. As a variant on the former, don't auto-update tests, but only
inform
>     the *contributor* of an end-to-end test of a failure. Responsibility
>     for determining failure vs false positive lies solely with them, and
>     normal channels are used to report a failure after it has been
>     confirmed/analyzed/explained.
I think I like this best of the three but it raises the question of what
happens when the contributor is no longer contributing.  Who's
responsible for the test?  Maybe it just sits there until someone else
claims it.
> I really think this is a problem we need to have thought through and 
> found a workable solution before end-to-end testing as proposed becomes 
> a practically workable option.
Noted.  I'm very happy to have this discussion and work the problem.

                     -David

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Oct 2019 - RFC: End-to-end testing

[llvm-dev] RFC: End-to-end testing

[llvm-dev] RFC: End-to-end testing

[llvm-dev] [cfe-dev] RFC: End-to-end testing

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: End-to-end testing

[llvm-dev] [cfe-dev] RFC: End-to-end testing

[llvm-dev] RFC: End-to-end testing

[llvm-dev] [cfe-dev] RFC: End-to-end testing

Maybe Matching Threads