Daniel Berlin via llvm-dev
2015-Nov-10 01:49 UTC
[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?
> > > The GCC test-suite, AFAIK, has very poor quality on what's considered > a pass or a failure,????? What makes you say this> and it's common to release GCC with thousands of > failures on those tests.Also not correct. https://gcc.gnu.org/gcc-4.4/criteria.html It is a zero regression policy for primary platforms. Look, I love LLVM as much as the next guy, but in the 15 years i worked on GCC, through lots of major and minor releases, i can't remember a single release with "thousands" of failures. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151109/e8bfe0ad/attachment.html>
Renato Golin via llvm-dev
2015-Nov-10 09:50 UTC
[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?
On 10 November 2015 at 01:49, Daniel Berlin <dberlin at dberlin.org> wrote:> https://gcc.gnu.org/gcc-4.4/criteria.html > > It is a zero regression policy for primary platforms. > > Look, I love LLVM as much as the next guy, but in the 15 years i worked on > GCC, through lots of major and minor releases, i can't remember a single > release with "thousands" of failures.Hi Daniel, This was not meant as a GCC vs LLVM rant. I have no affiliation nor an agenda. I was merely stating the quality of compiler test suites, and how valuable it would be to use the GCC tests in LLVM (or vice-versa). I agree with Paul that the LLVM tests are little more than smoke screen, and from what I've seen, the GCC tests are just a bigger smoke screen. I would first try to understand what in the GCC suite is complementary to ours, and what's redundant, before dumping it in. I may be wrong, and my experience is largely around Linaro (kernel, toolchain, android), so it may very well be biased. These are the data points I have for my statements: 1. GCC trunk is less stable than LLVM because the lack of general buildbots. * Testing a new patch means comparing the test results (including breakages) against the previous commit, and check the differences. This is a poor definition of "pass", especially when the number of failures is large. * On ARM and AArch64, the number of failures is around a couple of thousand (don't know the exact figure). AFAIK, these are not marked XFAIL in any way, but are known to be broken for one reason or another. * The set of failures is different for different sub-architectures and ARM developers have to know what's good and what's not based on that. If XFAILs were used more proficiently, they wouldn't have this problem. I hear some people don't like to XFAIL because they want to "one day fix the bug", but that's a personal opinion on the validity of XFAILs. * Linaro monthly releases go out with those failures, and the fact that they keep on going means the FSF releases do, too. This is a huge cost on the release process, since it needs complicated diff programs and often incur in manual analysis. * Comparing the previous release against the new won't account for new features/bugs that are introduced, and not all bugs get to bugzilla. We have the same problem in LLVM, but our developers know more or less what's being done. Not all of us track every new feature introduced by GCC, so tracking their new bugs would be a major task. 2. Linux kernel and Android builds with new GCC have increasing trouble. * I heard from both kernel and android engineers that every new GCC release shows more failures than the previous difference on their code. Ie. GCC 4.8->4.9 had a bigger delta than 4.7->4,8. * The LLVMLinux group reported more trouble moving between two GCC releases than porting to LLVM. * Most problems are due to new warnings and errors, but some are bugs that weren't caught by the regression nor the release processes. I understand it's impossible to catch all bugs, and that both the Linux Kernel and Android are large projects, but this demonstrates that the GCC release process is as good (or bad) as our own, but in a different mindset (focus on release validation, rather than trunk) and by a different community (which most of us don't track). My conclusion is that, if we're ever going to incorporate the GCC test-suite, it'll take a lot of time fudging it to be a pass/fail, and for every new version of it, we'll have the same amount of work. Reiterating Paul's points, I believe those tests to not have sufficient value to be worth the continuous effort. That means we'll have to rely on companies to do secondary screening for LLVM, something that I believe GCC would rather not do, but we seem to be ok with. Then again, I may be completely wrong. cheers, --renato
Robinson, Paul via llvm-dev
2015-Nov-10 14:57 UTC
[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?
> I was merely stating the quality of compiler test suites, and how > valuable it would be to use the GCC tests in LLVM (or vice-versa). I > agree with Paul that the LLVM tests are little more than smoke screen, > and from what I've seen, the GCC tests are just a bigger smoke screen.Possible language issue here.... A "smoke screen" is something to obscure/conceal whatever is behind it, by making it hard to see through the smoke. A "smoke test" is the initial power-on to see if your new hardware instantly starts billowing smoke (catches on fire). I view the Clang/LLVM regression tests as a "smoke test" for Clang/LLVM; the initial set of tests that tells you whether it is worth proceeding to more thorough/expensive testing. Not a smoke screen! Thanks, --paulr
Daniel Berlin via llvm-dev
2015-Nov-10 15:47 UTC
[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?
> > > > 1. GCC trunk is less stable than LLVM because the lack of general > buildbots. >GCC has plenty of buildbots, it has no revert-on-breakage policy.> * Testing a new patch means comparing the test results (including > breakages) against the previous commit, and check the differences. > This is a poor definition of "pass", especially when the number of > failures is largeThis is an artifact of the lack of a revert-on-breakage policy.> . > * On ARM and AArch64, the number of failures is around a couple of > thousand (don't know the exact figure). AFAIK, these are not marked > XFAIL in any way, but are known to be broken for one reason or > another. >That sounds like a failure on the part of the ARM developers.> > * Linaro monthly releases go out with those failures, and the fact > that they keep on going means the FSF releases do, too.I expect this would change if someone pushed. Here is, for example, the failure list for i686-pc-linux-gnu for each 4.9 release: https://gcc.gnu.org/gcc-4.9/buildstat.html This is a huge> cost on the release process, since it needs complicated diff programs > and often incur in manual analysis. >You say all this as if it is a GCC testsuite issue. It sounds completely like a process issue that hasn't been raised and dealt with. IE something that could easily happen to LLVM. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151110/62c544fd/attachment.html>
Chris Lattner via llvm-dev
2015-Nov-10 18:03 UTC
[llvm-dev] How LLVM guarantee the qualify of the product from the limited test suit?
On Nov 10, 2015, at 1:50 AM, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > My conclusion is that, if we're ever going to incorporate the GCC > test-suite, it'll take a lot of time fudging it to be a pass/fail, and > for every new version of it, we'll have the same amount of work.Hi Renato, It’s a non-technical concern, but please keep in mind that the GCC testsuite is GPL licensed. You’re effectively discussing a fork of their testsuite (to which I think that there are a number of technical challenges) but before actually landing it, we’d want to carefully discuss whether it is included as part of the LLVM project. If you want to pursue this, doing this on github (or some similar service) would be a better place to get started. -Chris