Very interesting study!
However as others have asked, I would like to see some aggregated data.
Also, how do you verify that the generated code is correct? How are you
systematically generating these tests?
In summary, do you have any paper coming along? :)
Thanks,
Nuno
----- Original Message -----
From: "John Regehr" <regehr at cs.utah.edu>
To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
Sent: Saturday, July 18, 2009 6:21 AM
Subject: Re: [LLVMdev] speed and code size issues
> We have some results that are somewhat entertaining and that relate to the
> size/speed discussion.
>
> The basic idea is exhaustive generation of C functions where
"exhaustive"
> is qualified by some structural restrictions (depth of AST, node type,
> etc.).
>
> For one particular set of restrictions we ended up with about 7 million C
> functions. We then compiled each of these functions with 7 compilers:
> llvm-gcc, clang, Intel cc, Sun cc, various versions of gcc.
>
> We then looked for functions where a particular pair of compilers
> exhibited widely differing abilities to optimize. For example, consider
> this function:
>
> int ZZ_0000728f(int x,int y){return o1s(m8s(x,-2),(x?1:y));}
>
> gcc-3.4 can see that it always returns 0, and emits code doing that. On
> the other hand, llvm-gcc emits 228 bytes of object code (at -Os) to
> compute the same zeroes.
>
> The funny-named functions are little safe-math utilities that avoid
> undefined behavior for all inputs. "o1s" is "mod 16-bit
signed" and "m8s"
> is "multiply 8-bit signed".
>
> Why is this interesting? Because it provides a way to systematically find
> areas of weakness in an optimizer, relative to a collection of other
> optimizers.
>
> If people would find it useful, I can put the full set of results on the
> web when time permits. I call the resulting codes "maximally
> embarrassing" since each function represents some significant failure
to
> optimize.
>
> The global maximally embarrasing function is one where various versions of
> gcc (including llvm-gcc) emit code returning constant 0 and clang emits
> 762 bytes of x86. The C code is this:
>
> int ZZ_00005bbd(int x,int y){return m1s((x?0:x),a8s(y,y));}
>
> The other embarrassing thing about these functions is that most compilers
> miscompile some of the 7 million functions. llvm-gcc and clang are the
> only ones we tested that actually get them all right.
>
> To compile these functions this code needs to be prepended:
>
> #include <limits.h>
> #include <stdint.h>
> #include "safe_abbrev.h"
> #include "safe_math.h"
>
> The safe math headers are here:
>
> http://www.cs.utah.edu/~regehr/safe_math/
>
> Anyway I just throw this out there. People on this list have told me
> before that missed-optimization bugs are not considered very interesting.
> The ideal result (from my point of view as a compiler consumer) would be
> for a few people from one or more of these compilers' development
> communities to take seriously the job of eliminating these embarrassments.
>
> John Regehr