thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] RFC: End-to-end testing [Oct 2019]

If this information is useful, please help other people find it:
Share via:

Mehdi AMINI via llvm-dev

2019-Oct-09 02:14 UTC

[llvm-dev] [cfe-dev] RFC: End-to-end testing

> I have a bit of concern about this sort of thing - worrying it'll lead
to
> people being less cautious about writing the more isolated tests.
>
I have the same concern. I really believe we need to be careful about
testing at the right granularity to keep things both modular and the
testing maintainable (for instance checking vectorized ASM from a C++
source through clang has always been considered a bad FileCheck practice).
(Not saying that there is no space for better integration testing in some
areas).

> That said, clearly there's value in end-to-end testing for all the
reasons
> you've mentioned (& we do see these problems in practice - recently
DWARF
> indexing broke when support for more nuanced language codes were added to
> Clang).
>
> Dunno if they need a new place or should just be more stuff in test-suite,
> though.
>
> On Tue, Oct 8, 2019 at 9:50 AM David Greene via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> [ I am initially copying only a few lists since they seem like
>>   the most impacted projects and I didn't want to spam all the
mailing
>>   lists.  Please let me know if other lists should be included. ]
>>
>> I submitted D68230 for review but this is not about that patch per se.
>> The patch allows update_cc_test_checks.py to process tests that should
>> check target asm rather than LLVM IR.  We use this facility downstream
>> for our end-to-end tests.  It strikes me that it might be useful for
>> upstream to do similar end-to-end testing.
>>
>> Now that the monorepo is about to become the canonical source of truth,
>> we have an opportunity for convenient end-to-end testing that we
didn't
>> easily have before with svn (yes, it could be done but in an ugly way).
>> AFAIK the only upstream end-to-end testing we have is in test-suite and
>> many of those codes are very large and/or unfocused tests.
>>
>> With the monorepo we have a place to put lit-style tests that exercise
>> multiple subprojects, for example tests that ensure the entire clang
>> compilation pipeline executes correctly.
>
>I don't think I agree with the relationship to the monorepo: there was
nothing that prevented tests inside the clang project to exercise the full
pipeline already. I don't believe that the SVN repo structure was really a
factor in the way the testing was setup, but instead it was a deliberate
choice in the way we structure our testing.
For instance I remember asking about implementing test based on checking if
some loops written in C source file were properly vectorized by the -O2 /
-O3 pipeline and it was deemed like the kind of test that we don't want to
maintain: instead I was pointed at the test-suite to add better benchmarks
there for the end-to-end story. What is interesting is that the test-suite
is not gonna be part of the monorepo!

To be clear: I'm not saying here we can't change our way of testing, I
just
don't think the monorepo has anything to do with it and that it should
carefully motivated and scoped into what belongs/doesn't belong to
integration tests.


> We could, for example, create
>> a top-level "test" directory and put end-to-end tests there. 
Some of
>> the things that could be tested include:
>>
>> - Pipeline execution (debug-pass=Executions)
>>
> - Optimization warnings/messages
>> - Specific asm code sequences out of clang (e.g. ensure certain loops
>>   are vectorized)
>> - Pragma effects (e.g. ensure loop optimizations are honored)
>> - Complete end-to-end PGO (generate a profile and re-compile)
>> - GPU/accelerator offloading
>> - Debuggability of clang-generated code
>>
>> Each of these things is tested to some degree within their own
>> subprojects, but AFAIK there are currently no dedicated tests ensuring
>> such things work through the entire clang pipeline flow and with other
>> tools that make use of the results (debuggers, etc.).  It is relatively
>> easy to break the pipeline while the individual subproject tests
>> continue to pass.
>>
>
I'm not sure I really see much in your list that isn't purely about
testing
clang itself here?
Actually the first one seems more of a pure LLVM test.

> I realize that some folks prefer to work on only a portion of the
>> monorepo (for example, they just hack on LLVM).  I am not sure how to
>> address those developers WRT end-to-end testing.  On the one hand,
>> requiring them to run end-to-end testing means they will have to at
>> least check out and build the monorepo.  On the other hand, it seems
>> less than ideal to have people developing core infrastructure and not
>> running tests.
>>
>I think we already expect LLVM developers to update clang APIs? And we
revert LLVM patches when clang testing is broken. So I believe the
acknowledgment to maintain the other in-tree projects isn't really new, it
is true that the monorepo will make this easy for everyone to reproduce
locally most failure, and find all the use of an API across projects (which
was provided as a motivation to move to a monorepo model:
https://llvm.org/docs/Proposals/GitHubMove.html#monorepo ).

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191008/55b4c71f/attachment.html>

David Greene via llvm-dev

2019-Oct-09 15:12 UTC

head link

[llvm-dev] [cfe-dev] RFC: End-to-end testing

Mehdi AMINI via cfe-dev <cfe-dev at lists.llvm.org> writes:
>> I have a bit of concern about this sort of thing - worrying it'll
lead to
>> people being less cautious about writing the more isolated tests.
>>
>
> I have the same concern. I really believe we need to be careful about
> testing at the right granularity to keep things both modular and the
> testing maintainable (for instance checking vectorized ASM from a C++
> source through clang has always been considered a bad FileCheck practice).
> (Not saying that there is no space for better integration testing in some
> areas).
I absolutely disagree about vectorization tests.  We have seen
vectorization loss in clang even though related LLVM lit tests pass,
because something else in the clang pipeline changed that caused the
vectorizer to not do its job.  We need both kinds of tests.  There are
many asm tests of value beyond vectorization and they should include
component and well as end-to-end tests.
> For instance I remember asking about implementing test based on checking if
> some loops written in C source file were properly vectorized by the -O2 /
> -O3 pipeline and it was deemed like the kind of test that we don't want
to
> maintain: instead I was pointed at the test-suite to add better benchmarks
> there for the end-to-end story. What is interesting is that the test-suite
> is not gonna be part of the monorepo!
And it shouldn't be.  It's much too big.  But there is a place for small
end-to-end tests that live alongside the code.
>>> We could, for example, create
>>> a top-level "test" directory and put end-to-end tests
there.  Some of
>>> the things that could be tested include:
>>>
>>> - Pipeline execution (debug-pass=Executions)
>>>
>>> - Optimization warnings/messages
>>> - Specific asm code sequences out of clang (e.g. ensure certain
loops
>>>   are vectorized)
>>> - Pragma effects (e.g. ensure loop optimizations are honored)
>>> - Complete end-to-end PGO (generate a profile and re-compile)
>>> - GPU/accelerator offloading
>>> - Debuggability of clang-generated code
>>>
>>> Each of these things is tested to some degree within their own
>>> subprojects, but AFAIK there are currently no dedicated tests
ensuring
>>> such things work through the entire clang pipeline flow and with
other
>>> tools that make use of the results (debuggers, etc.).  It is
relatively
>>> easy to break the pipeline while the individual subproject tests
>>> continue to pass.
>>>
>>
>
> I'm not sure I really see much in your list that isn't purely about
testing
> clang itself here?
Debugging and PGO involve other components, no?  If we want to put clang
end-to-end tests in the clang subdirectory, that's fine with me.  But we
need a place for tests that cut across components.

I could also imagine llvm-mca end-to-end tests through clang.
> Actually the first one seems more of a pure LLVM test.
Definitely not.  It would test the pipeline as constructed by clang,
which is very different from the default pipeline constructed by
opt/llc.  The old and new pass managers also construct different
pipelines.  As we have seen with various mailing list messages, this is
surprising to users.  Best to document and check it with testing.

                  -David

Mehdi AMINI via llvm-dev

2019-Oct-09 19:38 UTC

head link

[llvm-dev] [cfe-dev] RFC: End-to-end testing

On Wed, Oct 9, 2019 at 8:12 AM David Greene <dag at cray.com> wrote:
> Mehdi AMINI via cfe-dev <cfe-dev at lists.llvm.org> writes:
>
> >> I have a bit of concern about this sort of thing - worrying
it'll lead
> to
> >> people being less cautious about writing the more isolated tests.
> >>
> >
> > I have the same concern. I really believe we need to be careful about
> > testing at the right granularity to keep things both modular and the
> > testing maintainable (for instance checking vectorized ASM from a C++
> > source through clang has always been considered a bad FileCheck
> practice).
> > (Not saying that there is no space for better integration testing in
some
> > areas).
>
> I absolutely disagree about vectorization tests.  We have seen
> vectorization loss in clang even though related LLVM lit tests pass,
> because something else in the clang pipeline changed that caused the
> vectorizer to not do its job.

Of course, and as I mentioned I tried to add these tests (probably 4 or 5
years ago), but someone (I think Chandler?) was asking me at the time: does
it affect a benchmark performance? If so why isn't it tracked there? And if
not does it matter?
The benchmark was presented as the actual way to check this invariant
(because you're only vectoring to get performance, not for the sake of it).
So I never pursued, even if I'm a bit puzzled that we don't have such
tests.



> We need both kinds of tests.  There are
> many asm tests of value beyond vectorization and they should include
> component and well as end-to-end tests.
>
> > For instance I remember asking about implementing test based on
checking
> if
> > some loops written in C source file were properly vectorized by the
-O2 /
> > -O3 pipeline and it was deemed like the kind of test that we don't
want
> to
> > maintain: instead I was pointed at the test-suite to add better
> benchmarks
> > there for the end-to-end story. What is interesting is that the
> test-suite
> > is not gonna be part of the monorepo!
>
> And it shouldn't be.  It's much too big.  But there is a place for
small
> end-to-end tests that live alongside the code.
>
> >>> We could, for example, create
> >>> a top-level "test" directory and put end-to-end
tests there.  Some of
> >>> the things that could be tested include:
> >>>
> >>> - Pipeline execution (debug-pass=Executions)
> >>>
> >>> - Optimization warnings/messages
> >>> - Specific asm code sequences out of clang (e.g. ensure
certain loops
> >>>   are vectorized)
> >>> - Pragma effects (e.g. ensure loop optimizations are honored)
> >>> - Complete end-to-end PGO (generate a profile and re-compile)
> >>> - GPU/accelerator offloading
> >>> - Debuggability of clang-generated code
> >>>
> >>> Each of these things is tested to some degree within their own
> >>> subprojects, but AFAIK there are currently no dedicated tests
ensuring
> >>> such things work through the entire clang pipeline flow and
with other
> >>> tools that make use of the results (debuggers, etc.).  It is
relatively
> >>> easy to break the pipeline while the individual subproject
tests
> >>> continue to pass.
> >>>
> >>
> >
> > I'm not sure I really see much in your list that isn't purely
about
> testing
> > clang itself here?
>
> Debugging and PGO involve other components, no?

I don't think that you need anything else than LLVM core (which is a
dependency of clang) itself?

Things like PGO (unless you're using frontend instrumentation) don't
even
have anything to do with clang, so we may get into the situation David
mentioned where we would rely on clang to test LLVM features, which I find
non-desirable.


>   If we want to put clang
> end-to-end tests in the clang subdirectory, that's fine with me.  But
we
> need a place for tests that cut across components.
>
> I could also imagine llvm-mca end-to-end tests through clang.
>
> > Actually the first one seems more of a pure LLVM test.
>
> Definitely not.  It would test the pipeline as constructed by clang,
> which is very different from the default pipeline constructed by
> opt/llc.

I am not convinced it is "very" difference (they are using the
PassManagerBuilder AFAIK), I am only aware of very subtle difference.
But more fundamentally: *should* they be different? I would want `opt -O3`
to be able to reproduce 1-1 the clang pipeline.
Isn't it the role of LLVM PassManagerBuilder to expose what is the
"-O3"
pipeline?
If we see the PassManagerBuilder as the abstraction for the pipeline, then
I don't see what testing belongs to clang here, this seems like a layering
violation (and maintaining the PassManagerBuilder in LLVM I wouldn't want
to have to update the tests of all the subproject using it because they
retest the same feature).


> The old and new pass managers also construct different
> pipelines.  As we have seen with various mailing list messages, this is
> surprising to users.  Best to document and check it with testing.
>
Yes: both old and new pass managers are LLVM components, so hopefully that
are documented and tested in LLVM :)

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191009/cf333ca6/attachment.html>

Florian Hahn via llvm-dev

2019-Oct-10 09:55 UTC

head link

[llvm-dev] [cfe-dev] RFC: End-to-end testing

> On Oct 9, 2019, at 16:12, David Greene via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Mehdi AMINI via cfe-dev <cfe-dev at lists.llvm.org> writes:
> 
>>> I have a bit of concern about this sort of thing - worrying
it'll lead to
>>> people being less cautious about writing the more isolated tests.
>>> 
>> 
>> I have the same concern. I really believe we need to be careful about
>> testing at the right granularity to keep things both modular and the
>> testing maintainable (for instance checking vectorized ASM from a C++
>> source through clang has always been considered a bad FileCheck
practice).
>> (Not saying that there is no space for better integration testing in
some
>> areas).
> 
> I absolutely disagree about vectorization tests.  We have seen
> vectorization loss in clang even though related LLVM lit tests pass,
> because something else in the clang pipeline changed that caused the
> vectorizer to not do its job.  We need both kinds of tests.  There are
> many asm tests of value beyond vectorization and they should include
> component and well as end-to-end tests.

Have you considered alternatives to checking the assembly for ensuring
vectorization or other transformations? For example, instead of checking the
assembly, we could check LLVM’s statistics or optimization remarks. If you want
to ensure a loop got vectorized, you could check the loop-vectorize remarks,
which should give you the position of the loop in the source and
vectorization/interleave factor used. There are few other things that could go
wrong later on that would prevent vector instruction selection, but I think it
should be sufficient to guard against most cases where we loose vectorization
and should be much more robust to unrelated changes. If there are additional
properties you want to ensure, they potentially could be added to the remark as
well.

This idea of leveraging statistics and optimization remarks to track the impact
of changes on overall optimization results is nothing new and I think several
people already discussed it in various forms. For regular benchmark runs, in
addition to tracking the existing benchmarks, we could also track selected
optimization remarks (e.g. loop-vectorize, but not necessarily noisy ones like
gvn) and statistics. Comparing those run-to-run could potentially highlight new
end-to-end issues on a much larger scale, across all existing benchmarks
integrated in test-suite. We might be able to detect loss in vectorization
pro-actively, instead of requiring someone to file a bug report and then we add
an isolated test after the fact.

But building something like this would be much more work of course….

Cheers,
Florian

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Oct 2019 - [cfe-dev] RFC: End-to-end testing

[llvm-dev] [cfe-dev] RFC: End-to-end testing

[llvm-dev] [cfe-dev] RFC: End-to-end testing

[llvm-dev] [cfe-dev] RFC: End-to-end testing

[llvm-dev] [cfe-dev] RFC: End-to-end testing

Reasonably Related Threads