John Regehr
2008-Jul-20 21:19 UTC
[LLVMdev] qualitative comparison of correctness of llvm and gcc
Hi folks, We recently generated some data that seemed interesting enough to share here. This is a comparison between compilers that ignores the performance of the generated code and focuses only on compiler correctness. volatile checksum errors errors avr-gcc-3.4 1.879% 0.378% avr-gcc-4.1 0.037% 0.256% avr-gcc-4.2 0.753% 0.225% gcc-3.4 1.222% 0.004% gcc-4.0 0.041% 0.026% gcc-4.1 0.196% 0.024% gcc-4.2 0.770% 0.004% gcc-4.3 0.727% 0.003% llvm-gcc-2.2 18.712% 0.110% llvm-gcc-pre-2.4 0.000% 0.006% Notes: These numbers come from throwing 100,000 random C programs at the various compilers. For each compiler, we compiled each test case at -O0, -O1, -O2, -Os, and -O3 and compared the results of these compilations. Random programs that crashed a compiler, and executables that failed to terminate in a timely fashion, did not count towards the 100,000. Target is ia32 except for the top three compilers, which target AVR, an 8-bit MCU. Host is Ubuntu Feisty. A "volatile error" indicates a case where a compiler failed to respect the volatile invariant. The volatile invariant is simply that changing the optimization level of a strictly conforming C program must not change the number of dynamic loads or stores to any variable that is volatile-qualified in the compiler's input. We check this with a hacked valgrind (for x86) or a hacked simulator (for AVR). A "checksum error" indicates a case where changing the optimization level of a compiler changed the computation performed by its generated code in a way that is not correctness-preserving. The checksum is taken over the global variables in the random program just before it terminates. We believe that our random programs are free of unspecified and undefined behavior (e.g. divide by zero, pointer misuse, uninitialized variable use, dependency on order of evaluation, etc.). Therefore, every checksum or volatile error is the result of a real compiler bug. We can't easily prove this. However, whenever we look closely at an apparent error, it becomes possible to legitimately blame the compiler. I don't have handy the exact svn rev of the pre-2.4 llvm that we tested but it's from a few weeks ago. The improvement in llvm between 2.2 and now is no doubt the result of many factors. I would argue that one of the most important factors was the llvm team's rapid fixing of 5 volatile bugs and 8 code generation bugs that we reported. We didn't test 2.3 because some bugs that we reported were fixed before that release, some after. Current llvm is also far harder to crash than 2.2, although we lack statistics on this. Since the snapshot a few weeks ago, the llvm team has fixed a few more bugs. If this process continues it seems possible that within the foreseeable future, llvm will be effectively free of crashes, codegen errors, and volatile errors (for the class of random programs that we generate). If anyone is curious about the kinds of bugs that are found through random testing, a list follows. Volatile: http://llvm.org/bugs/show_bug.cgi?id=2182 http://llvm.org/bugs/show_bug.cgi?id=2234 http://llvm.org/bugs/show_bug.cgi?id=2260 http://llvm.org/bugs/show_bug.cgi?id=2262 http://llvm.org/bugs/show_bug.cgi?id=2496 Codegen: http://llvm.org/bugs/show_bug.cgi?id=2271 http://llvm.org/bugs/show_bug.cgi?id=2274 http://llvm.org/bugs/show_bug.cgi?id=2276 http://llvm.org/bugs/show_bug.cgi?id=2294 http://llvm.org/bugs/show_bug.cgi?id=2364 http://llvm.org/bugs/show_bug.cgi?id=2435 http://llvm.org/bugs/show_bug.cgi?id=2479 http://llvm.org/bugs/show_bug.cgi?id=2487 http://llvm.org/bugs/show_bug.cgi?id=2568 Crash/hang: http://llvm.org/bugs/show_bug.cgi?id=2264 http://llvm.org/bugs/show_bug.cgi?id=2323 http://llvm.org/bugs/show_bug.cgi?id=2326 http://llvm.org/bugs/show_bug.cgi?id=2234 http://llvm.org/bugs/show_bug.cgi?id=2372 http://llvm.org/bugs/show_bug.cgi?id=2422 http://llvm.org/bugs/show_bug.cgi?id=2455 http://llvm.org/bugs/show_bug.cgi?id=2497 http://llvm.org/bugs/show_bug.cgi?id=2497 http://llvm.org/bugs/show_bug.cgi?id=2513 http://llvm.org/bugs/show_bug.cgi?id=2570 John Regehr
John Regehr
2008-Jul-20 21:23 UTC
[LLVMdev] qualitative comparison of correctness of llvm and gcc
Of course I meant "quantitative comparison." John
Bill Wendling
2008-Jul-20 23:10 UTC
[LLVMdev] qualitative comparison of correctness of llvm and gcc
On Jul 20, 2008, at 2:19 PM, John Regehr wrote:> Hi folks, > > We recently generated some data that seemed interesting enough to > share > here. This is a comparison between compilers that ignores the > performance of the generated code and focuses only on compiler > correctness. > > volatile checksum > errors errors > > avr-gcc-3.4 1.879% 0.378% > avr-gcc-4.1 0.037% 0.256% > avr-gcc-4.2 0.753% 0.225% > gcc-3.4 1.222% 0.004% > gcc-4.0 0.041% 0.026% > gcc-4.1 0.196% 0.024% > gcc-4.2 0.770% 0.004% > gcc-4.3 0.727% 0.003% > llvm-gcc-2.2 18.712% 0.110% > llvm-gcc-pre-2.4 0.000% 0.006% >John, These numbers are terrific! Thank you for compiling them and for the bugs you filed. From the list below, it looks like there are only 2 outstanding bugs -- one in checksum and one non-terminating compilation. Very nice! -bw
Duncan Sands
2008-Jul-21 07:20 UTC
[LLVMdev] qualitative comparison of correctness of llvm and gcc
Hi John,> A "volatile error" indicates a case where a compiler failed to respect > the volatile invariant. The volatile invariant is simply that changing > the optimization level of a strictly conforming C program must not > change the number of dynamic loads or stores to any variable that is > volatile-qualified in the compiler's input. We check this with a hacked > valgrind (for x86) or a hacked simulator (for AVR).does this also check that writes are atomic: that they are performed in one processor operation? For example, I recently fixed a bug where a write of a double constant on x86-32 (which can be written atomically) was being turned into a write of an i64, which then got legalized into two i32 writes. Ciao, Duncan.
John Regehr
2008-Jul-21 15:25 UTC
[LLVMdev] qualitative comparison of correctness of llvm and gcc
Hi Duncan-> does this also check that writes are atomic: that they are performed in > one processor operation?Can you elaborate a bit? I don't think volatile has any atomicity requirements. Of course I can make a struct, an int128_t, or whatever volatile (on AVR even an int16_t is updated non-atomically!). Lack of atomicity is one of many problems with using volatile as a basis for creating correct systems code... Thanks, John
Kenneth Hoste
2008-Aug-02 18:29 UTC
[LLVMdev] qualitative comparison of correctness of llvm and gcc
Hello, On Jul 20, 2008, at 23:19 , John Regehr wrote:> These numbers come from throwing 100,000 random C programs at the > various compilers. For each compiler, we compiled each test case at > -O0, -O1, -O2, -Os, and -O3 and compared the results of these > compilations. Random programs that crashed a compiler, and > executables > that failed to terminate in a timely fashion, did not count towards > the > 100,000.Maybe I missed it in the follow-up discussion (I had to work through over 500 mails after getting back from holiday, so I wasn't always as thorough as I should), but I was wondering which technique/tool you used to generate these C programs. Maybe I'm being naive, but it doesn't seem like an easy task to generate non-trivial C programs to feed into a compiler... Also, where would I be able to find more details on this experiment? That is, is there any paper or technical report on this? Or something else with more details on the setup of this? greetings, Kenneth Hoste Phd student @ Ghent University, Belgium -- Computer Science is no more about computers than astronomy is about telescopes. (E. W. Dijkstra) Kenneth Hoste ELIS - Ghent University email: kenneth.hoste at elis.ugent.be blog: http://www.elis.ugent.be/~kehoste/blog website: http://www.elis.ugent.be/~kehoste