Lately our random C program generator has seemed quite successful at catching regressions in llvm-gcc that the test suite misses. I'd suggest running some fixed number of random programs as part of the validation suite. On a fastish quad core I can test about 25,000 programs in 24 hours. Our hacked valgrind (which looks for volatile miscompilations) is a bottleneck, leaving this out would speed up the process considerably. We've never tested llvm-gcc for x64 using random testing, doing this would likely turn up a nice crop of bugs. I just started a random test run of llvm-gcc 2.0-2.4 that should provide some interesting quantitative results comparing these compilers in terms of crashes, volatile miscompilations, and regular miscompilations. However it may take a month or so to get statistical significance since 2.3 and 2.4 have quite low failure rates. John Regehr
On Monday 10 November 2008 22:17, John Regehr wrote:> Lately our random C program generator has seemed quite successful at > catching regressions in llvm-gcc that the test suite misses. I'd suggest > running some fixed number of random programs as part of the validation > suite. On a fastish quad core I can test about 25,000 programs in 24The problem with random tests is that they're just that -- random. You can't have a known suite to validate with. Now, if we gbenerate some tests that cause things to fail and then add those to the LLVM test suite, I'd be all for it.> We've never tested llvm-gcc for x64 using random testing, doing this would > likely turn up a nice crop of bugs.Definitely. Random testing is certainly useful. Once random tests are added to a testsuite, we can use them for validation. But I wouldn't want to require a validation to pass some set of random tests that shifts each test cycle. -Dave
> to a testsuite, we can use them for validation. But I wouldn't want to > require a validation to pass some set of random tests that shifts each test > cycle.This is easy to fix: just specify a starting seed for the PRNG. However I think you should get past your prejudice against tests that shift each cycle, since changing tests have the advantage of increased test coverage. Different parts of a test suite have different purposes, and of course random programs would not replace any part of the existing collection of fixed test cases. I woudn't be making this argument if I hadn't seen for myself how one week random testing gives you nothing, the next week a whole pile of previously unknown failures. Alternatively we are working to generalize our program generator a bit so that it does a DFS or BFS to generate all programs smaller than some size bound (obviously we need to fudge on integer constants, for example by picking from a predetermined set of interesting constants). Once we do this it may be worth adding the resulting test programs to LLVM's test suite. John Regehr