thr3ads.net - llvm dev - [llvm-dev] [Openmp-dev] [cfe-dev] RFC: End-to-end testing [Oct 2019]

If this information is useful, please help other people find it:
Share via:

David Greene via llvm-dev

2019-Oct-16 20:00 UTC

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: End-to-end testing

Renato Golin via Openmp-dev <openmp-dev at lists.llvm.org> writes:
> We already have tests in clang that check for diagnostics, IR and
> other things. Expanding those can handle 99.9% of what Clang could
> possibly do without descending into assembly.
I agree that for a great many things this is sufficient.
> Assembly errors are more complicated than just "not generating
VADD",
> and that's easier done in the TS than LIT.
Can you elaborate?  I'm talking about very small tests targeted to
generate a specific instruction or small number of instructions.
Vectorization isn't the best example.  Something like verifying FMA
generation is a better example.

                        -David

Roman Lebedev via llvm-dev

2019-Oct-16 20:09 UTC

head link

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: End-to-end testing

FWIW I'm personally cautiously non-optimistic about this,
but maybe i'm just not seeing the whole picture of the proposal.

Both checking final asm, and checking more than one layer of abstraction
feels overreaching and very prone to breakage/too restrictful.
Even minimal changes to the scheduling for particular CPU can cause many
instructions to reorder.
I'm not sure what effect that will have on middle-end pass development, too.

A change affects these end-to-end tests, what then?
Just blindly regenerate every affected test?
This will be further complicated once clang isn't the only upstream
front-end.

Roman.

On Wed, Oct 16, 2019 at 11:00 PM David Greene via cfe-dev
<cfe-dev at lists.llvm.org> wrote:>
> Renato Golin via Openmp-dev <openmp-dev at lists.llvm.org> writes:
>
> > We already have tests in clang that check for diagnostics, IR and
> > other things. Expanding those can handle 99.9% of what Clang could
> > possibly do without descending into assembly.
>
> I agree that for a great many things this is sufficient.
>
> > Assembly errors are more complicated than just "not generating
VADD",
> > and that's easier done in the TS than LIT.
>
> Can you elaborate?  I'm talking about very small tests targeted to
> generate a specific instruction or small number of instructions.
> Vectorization isn't the best example.  Something like verifying FMA
> generation is a better example.
>
>                         -David
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

David Blaikie via llvm-dev

2019-Oct-16 21:03 UTC

head link

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: End-to-end testing

On Wed, Oct 16, 2019 at 1:09 PM Roman Lebedev via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> FWIW I'm personally cautiously non-optimistic about this,
> but maybe i'm just not seeing the whole picture of the proposal.
>
> Both checking final asm, and checking more than one layer of abstraction
> feels overreaching and very prone to breakage/too restrictful.
> Even minimal changes to the scheduling for particular CPU can cause many
> instructions to reorder.
> I'm not sure what effect that will have on middle-end pass development,
> too.
>
> A change affects these end-to-end tests, what then?
> Just blindly regenerate every affected test?
> This will be further complicated once clang isn't the only upstream
> front-end.
>
Agreed that the broader a test is, the more careful one has to be about
making it reliable in spite of other changes - sometimes that's really
difficult (if you're trying to get a particular instruction selection or
register allocation) but in others it can be fairly reliable if done
carefully to sufficiently restrict optimizations, etc. (having function
calls to external functions to act as sinks/sources for values, etc, for
instance - picking places where the output is already "optimal" and
trivially/obviously so (for whatever set of constraints you've provided -
not heroic optimizations, etc) to ensure that it's fairly stable)



- Dave

>
> Roman.
>
> On Wed, Oct 16, 2019 at 11:00 PM David Greene via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
> >
> > Renato Golin via Openmp-dev <openmp-dev at lists.llvm.org>
writes:
> >
> > > We already have tests in clang that check for diagnostics, IR and
> > > other things. Expanding those can handle 99.9% of what Clang
could
> > > possibly do without descending into assembly.
> >
> > I agree that for a great many things this is sufficient.
> >
> > > Assembly errors are more complicated than just "not
generating VADD",
> > > and that's easier done in the TS than LIT.
> >
> > Can you elaborate?  I'm talking about very small tests targeted to
> > generate a specific instruction or small number of instructions.
> > Vectorization isn't the best example.  Something like verifying
FMA
> > generation is a better example.
> >
> >                         -David
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191016/f195d4fa/attachment.html>

Renato Golin via llvm-dev

2019-Oct-17 10:00 UTC

head link

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: End-to-end testing

On Wed, 16 Oct 2019 at 21:00, David Greene <greened at obbligato.org>
wrote:> Can you elaborate?  I'm talking about very small tests targeted to
> generate a specific instruction or small number of instructions.
> Vectorization isn't the best example.  Something like verifying FMA
> generation is a better example.
To check that instructions are generated from source, a two-step test
is the best approach:
 - Verify that Clang emits different IR for different options, or the
right IR for a new functionality
 - Verify that the affected targets (or at least two of the main ones)
can take that IR and generate the right asm

Clang can emit LLVM IR for any target, but you don't necessarily need
to build the back-ends.

If you want to do the test in Clang all the way to asm, you need to
make sure the back-end is built. Clang is not always build with all
back-ends, possibly even none.

To do that in the back-end, you'd have to rely on Clang being built,
which is not always true.

Hacking our test infrastructure to test different things when a
combination of components is built, especially after they start to
merge after being in a monorepo, will complicate tests and increase
the likelihood that some tests will never be run by CI and bit rot.

On the test-suite, you can guarantee that the whole toolchain is
available: Front and back end of the compilers, assemblers (if
necessary), linkers, libraries, etc.

Writing a small source file per test, as you would in Clang/LLVM,
running LIT and FileCheck, and *always* running it in the TS would be
trivial.

--renato

Robinson, Paul via llvm-dev

2019-Oct-17 15:28 UTC

head link

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: End-to-end testing

Renato wrote:> If you want to do the test in Clang all the way to asm, you need to
> make sure the back-end is built. Clang is not always build with all
> back-ends, possibly even none.
This is no different than today. Many tests in Clang require a specific
target to exist. Grep clang/test for "registered-target" for example;
I get 577 hits.  Integration tests (here called "end-to-end" tests)
clearly need to specify their REQUIRES conditions correctly.
> To do that in the back-end, you'd have to rely on Clang being built,
> which is not always true.
A frontend-based test in the backend would be a layering violation.
Nobody is suggesting that.
> Hacking our test infrastructure to test different things when a
> combination of components is built, especially after they start to
> merge after being in a monorepo, will complicate tests and increase
> the likelihood that some tests will never be run by CI and bit rot.
Monorepo isn't the relevant thing.  It's all about the build config.

Any test introduced by any patch today is expected to be run by CI.
This expectation would not be any different for these integration tests.
> On the test-suite, you can guarantee that the whole toolchain is
> available: Front and back end of the compilers, assemblers (if
> necessary), linkers, libraries, etc.
> 
> Writing a small source file per test, as you would in Clang/LLVM,
> running LIT and FileCheck, and *always* running it in the TS would be
> trivial.
I have to say, it's highly unusual for me to make a commit that
does *not* produce blame mail from some bot running lit tests.
Thankfully it's rare to get one that is actually my fault.

I can't remember *ever* getting blame mail related to test-suite.
Do they actually run?  Do they ever catch anything?  Do they ever
send blame mail?  I have to wonder about that.

Mehdi wrote:> David Greene wrote:
>> Personally, I still find source-to-asm tests to be highly valuable and
I
>> don't think we need test-suite for that.  Such tests don't
(usually)
>> depend on system libraries (headers may occasionally be an issue but I
>> would argue that the test is too fragile in that case).
>> 
>> So maybe we separate concerns.  Use test-suite to do the kind of
>> system-level testing you've discussed but still allow some tests in
a
>> monorepo top-level directory that test across components but don't
>> depend on system configurations.
>> 
>> If people really object to a top-level monorepo test directory I guess
>> they could go into test-suite but that makes it much more cumbersome to
>> run what really should be very simple tests.
>
> The main thing I see that will justify push-back on such test is the
> maintenance: you need to convince everyone that every component in LLVM
> must also maintain (update, fix, etc.) the tests that are in other
> components (clang, flang, other future subproject, etc.). Changing the
> vectorizer in the middle-end may require now to understand the kind of
> update a test written in Fortran (or Haskell?) is checking with some
> Hexagon assembly. This is a non-trivial burden when you compute the
> full matrix of possible frontend and backends.
So how is this different from today?  If I put in a patch that breaks
Hexagon, or compiler-rt, or LLDB, none of which I really understand...
or omg Chrome, which isn't even an LLVM project... it's still my job to 
fix whatever is broken.  If it's some component where I am genuinely
clueless, I'm expected to ask for help.  Integration tests would not be 
any different.  

Flaky or fragile tests that constantly break for no good reason would
need to be replaced or made more robust.  Again this is no different
from any other flaky or fragile test.

I can understand people being worried that because an integration test
depends on more components, it has a wider "surface area" of potential
breakage points.  This, I claim, is exactly the *value* of such tests.
And I see them breaking primarily under two conditions.

1) Something is broken that causes other component-level failures.
   Fixing that component-level problem will likely fix the integration
   test as well; or, the integration test must be fixed the same way
   as the component-level tests.

2) Something is broken that does *not* cause other component-level
   failures.  That's exactly what integration tests are for!  They
   verify *interactions* that are hard or maybe impossible to test in
   a component-level way.

The worry I'm hearing is about a third category:

3) Integration tests fail due to fragility or overly-specific checks.

...which should be addressed in exactly the same way as our overly
fragile or overly specific component-level tests.  Is there some
reason they wouldn't be?

--paulr

David Greene via llvm-dev

2019-Oct-17 17:09 UTC

head link

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: End-to-end testing

Renato Golin <rengolin at gmail.com> writes:
> On Wed, 16 Oct 2019 at 21:00, David Greene <greened at obbligato.org>
wrote:
>> Can you elaborate?  I'm talking about very small tests targeted to
>> generate a specific instruction or small number of instructions.
>> Vectorization isn't the best example.  Something like verifying FMA
>> generation is a better example.
>
> To check that instructions are generated from source, a two-step test
> is the best approach:
>  - Verify that Clang emits different IR for different options, or the
> right IR for a new functionality
>  - Verify that the affected targets (or at least two of the main ones)
> can take that IR and generate the right asm
Yes, of course we have tests like that.  We have found they are not
always sufficient.
> If you want to do the test in Clang all the way to asm, you need to
> make sure the back-end is built. Clang is not always build with all
> back-ends, possibly even none.
Right, which is why we have things like REQUIRES: x86-registered-target.
> To do that in the back-end, you'd have to rely on Clang being built,
> which is not always true.
Sure.
> Hacking our test infrastructure to test different things when a
> combination of components is built, especially after they start to
> merge after being in a monorepo, will complicate tests and increase
> the likelihood that some tests will never be run by CI and bit rot.
>From other discussion, it sounds like at least some people are open toasm tests under clang.  I think that should be fine.  But there are
probably other kinds of end-to-end tests that should not live under
clang.
> On the test-suite, you can guarantee that the whole toolchain is
> available: Front and back end of the compilers, assemblers (if
> necessary), linkers, libraries, etc.
>
> Writing a small source file per test, as you would in Clang/LLVM,
> running LIT and FileCheck, and *always* running it in the TS would be
> trivial.
How often would such tests be run as part of test-suite?

Honestly, it's not really clear to me exactly which bots cover what, how
often they run and so on.  Is there a document somewhere describing the
setup?

                     -David

llvm dev - Oct 2019 - [Openmp-dev] [cfe-dev] RFC: End-to-end testing

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: End-to-end testing

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: End-to-end testing

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: End-to-end testing

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: End-to-end testing

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: End-to-end testing

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: End-to-end testing