thr3ads.net - llvm dev - [llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit? [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin via llvm-dev

2015-Nov-10 01:49 UTC

[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?

>
>
> The GCC test-suite, AFAIK, has very poor quality on what's considered
> a pass or a failure,


?????
What makes you say this


> and it's common to release GCC with thousands of
> failures on those tests.


Also not correct.

https://gcc.gnu.org/gcc-4.4/criteria.html

It is a zero regression policy for primary platforms.

Look, I love LLVM as much as the next guy, but in the 15 years i worked on
GCC, through lots of major and minor releases, i can't remember a single
release with "thousands" of failures.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151109/e8bfe0ad/attachment.html>

Renato Golin via llvm-dev

2015-Nov-10 09:50 UTC

head link

[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?

On 10 November 2015 at 01:49, Daniel Berlin <dberlin at dberlin.org>
wrote:> https://gcc.gnu.org/gcc-4.4/criteria.html
>
> It is a zero regression policy for primary platforms.
>
> Look, I love LLVM as much as the next guy, but in the 15 years i worked on
> GCC, through lots of major and minor releases, i can't remember a
single
> release with "thousands" of failures.
Hi Daniel,

This was not meant as a GCC vs LLVM rant. I have no affiliation nor an agenda.

I was merely stating the quality of compiler test suites, and how
valuable it would be to use the GCC tests in LLVM (or vice-versa). I
agree with Paul that the LLVM tests are little more than smoke screen,
and from what I've seen, the GCC tests are just a bigger smoke screen.
I would first try to understand what in the GCC suite is complementary
to ours, and what's redundant, before dumping it in.

I may be wrong, and my experience is largely around Linaro (kernel,
toolchain, android), so it may very well be biased. These are the data
points I have for my statements:

1. GCC trunk is less stable than LLVM because the lack of general buildbots.
 * Testing a new patch means comparing the test results (including
breakages) against the previous commit, and check the differences.
This is a poor definition of "pass", especially when the number of
failures is large.
 * On ARM and AArch64, the number of failures is around a couple of
thousand (don't know the exact figure). AFAIK, these are not marked
XFAIL in any way, but are known to be broken for one reason or
another.
 * The set of failures is different for different sub-architectures
and ARM developers have to know what's good and what's not based on
that. If XFAILs were used more proficiently, they wouldn't have this
problem. I hear some people don't like to XFAIL because they want to
"one day fix the bug", but that's a personal opinion on the
validity
of XFAILs.
 * Linaro monthly releases go out with those failures, and the fact
that they keep on going means the FSF releases do, too. This is a huge
cost on the release process, since it needs complicated diff programs
and often incur in manual analysis.
 * Comparing the previous release against the new won't account for
new features/bugs that are introduced, and not all bugs get to
bugzilla. We have the same problem in LLVM, but our developers know
more or less what's being done. Not all of us track every new feature
introduced by GCC, so tracking their new bugs would be a major task.

2. Linux kernel and Android builds with new GCC have increasing trouble.
 * I heard from both kernel and android engineers that every new GCC
release shows more failures than the previous difference on their
code. Ie. GCC 4.8->4.9 had a bigger delta than 4.7->4,8.
 * The LLVMLinux group reported more trouble moving between two GCC
releases than porting to LLVM.
 * Most problems are due to new warnings and errors, but some are bugs
that weren't caught by the regression nor the release processes.

I understand it's impossible to catch all bugs, and that both the
Linux Kernel and Android are large projects, but this demonstrates
that the GCC release process is as good (or bad) as our own, but in a
different mindset (focus on release validation, rather than trunk) and
by a different community (which most of us don't track).

My conclusion is that, if we're ever going to incorporate the GCC
test-suite, it'll take a lot of time fudging it to be a pass/fail, and
for every new version of it, we'll have the same amount of work.
Reiterating Paul's points, I believe those tests to not have
sufficient value to be worth the continuous effort. That means we'll
have to rely on companies to do secondary screening for LLVM,
something that I believe GCC would rather not do, but we seem to be ok
with.

Then again, I may be completely wrong.

cheers,
--renato

Robinson, Paul via llvm-dev

2015-Nov-10 14:57 UTC

head link

[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?

> I was merely stating the quality of compiler test suites, and how
> valuable it would be to use the GCC tests in LLVM (or vice-versa). I
> agree with Paul that the LLVM tests are little more than smoke screen,
> and from what I've seen, the GCC tests are just a bigger smoke screen.
Possible language issue here....

A "smoke screen" is something to obscure/conceal whatever is behind
it,
by making it hard to see through the smoke.
A "smoke test" is the initial power-on to see if your new hardware 
instantly starts billowing smoke (catches on fire).

I view the Clang/LLVM regression tests as a "smoke test" for
Clang/LLVM;
the initial set of tests that tells you whether it is worth proceeding
to more thorough/expensive testing.  Not a smoke screen!
Thanks,
--paulr

Daniel Berlin via llvm-dev

2015-Nov-10 15:47 UTC

head link

[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?

>
>
>
> 1. GCC trunk is less stable than LLVM because the lack of general
> buildbots.
>
GCC has plenty of buildbots, it has no revert-on-breakage policy.

>  * Testing a new patch means comparing the test results (including
> breakages) against the previous commit, and check the differences.
> This is a poor definition of "pass", especially when the number
of
> failures is large

This is an artifact of the lack of a revert-on-breakage policy.

> .
>  * On ARM and AArch64, the number of failures is around a couple of
> thousand (don't know the exact figure). AFAIK, these are not marked
> XFAIL in any way, but are known to be broken for one reason or
> another.
>
That sounds like a failure on the part of the ARM developers.
>
>  * Linaro monthly releases go out with those failures, and the fact
> that they keep on going means the FSF releases do, too.

I expect this would change if someone pushed.

Here is, for example, the failure list for i686-pc-linux-gnu for each 4.9
release:
https://gcc.gnu.org/gcc-4.9/buildstat.html


This is a huge> cost on the release process, since it needs complicated diff programs
> and often incur in manual analysis.
>

You say all this as if it is a GCC testsuite issue.

It sounds completely like a process issue that hasn't been raised and dealt
with.

IE something that could easily happen to LLVM.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151110/62c544fd/attachment.html>

Chris Lattner via llvm-dev

2015-Nov-10 18:03 UTC

head link

[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?

On Nov 10, 2015, at 1:50 AM, Renato Golin via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 
> My conclusion is that, if we're ever going to incorporate the GCC
> test-suite, it'll take a lot of time fudging it to be a pass/fail, and
> for every new version of it, we'll have the same amount of work.
Hi Renato,

It’s a non-technical concern, but please keep in mind that the GCC testsuite is
GPL licensed.  You’re effectively discussing a fork of their testsuite (to which
I think that there are a number of technical challenges) but before actually
landing it, we’d want to carefully discuss whether it is included as part of the
LLVM project.  If you want to pursue this, doing this on github (or some similar
service) would be a better place to get started.

-Chris

llvm dev - Nov 2015 - How LLVM guarantee the qualify of the product from the limited test suit?

[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?

[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?

[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?

[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?

[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?