thr3ads.net - llvm dev - [LLVMdev] speed and code size issues [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Nuno Lopes

2009-Jul-18 18:21 UTC

[LLVMdev] speed and code size issues

Very interesting study!
However as others have asked, I would like to see some aggregated data. 
Also, how do you verify that the generated code is correct?  How are you 
systematically generating these tests?
In summary, do you have any paper coming along? :)

Thanks,
Nuno

----- Original Message ----- 
From: "John Regehr" <regehr at cs.utah.edu>
To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
Sent: Saturday, July 18, 2009 6:21 AM
Subject: Re: [LLVMdev] speed and code size issues

> We have some results that are somewhat entertaining and that relate to the
> size/speed discussion.
>
> The basic idea is exhaustive generation of C functions where
"exhaustive"
> is qualified by some structural restrictions (depth of AST, node type,
> etc.).
>
> For one particular set of restrictions we ended up with about 7 million C
> functions.  We then compiled each of these functions with 7 compilers:
> llvm-gcc, clang, Intel cc, Sun cc, various versions of gcc.
>
> We then looked for functions where a particular pair of compilers
> exhibited widely differing abilities to optimize.  For example, consider
> this function:
>
>   int ZZ_0000728f(int x,int y){return o1s(m8s(x,-2),(x?1:y));}
>
> gcc-3.4 can see that it always returns 0, and emits code doing that.  On
> the other hand, llvm-gcc emits 228 bytes of object code (at -Os) to
> compute the same zeroes.
>
> The funny-named functions are little safe-math utilities that avoid
> undefined behavior for all inputs.  "o1s" is "mod 16-bit
signed" and "m8s"
> is "multiply 8-bit signed".
>
> Why is this interesting?  Because it provides a way to systematically find
> areas of weakness in an optimizer, relative to a collection of other
> optimizers.
>
> If people would find it useful, I can put the full set of results on the
> web when time permits.  I call the resulting codes "maximally
> embarrassing" since each function represents some significant failure
to
> optimize.
>
> The global maximally embarrasing function is one where various versions of
> gcc (including llvm-gcc) emit code returning constant 0 and clang emits
> 762 bytes of x86.  The C code is this:
>
>   int ZZ_00005bbd(int x,int y){return m1s((x?0:x),a8s(y,y));}
>
> The other embarrassing thing about these functions is that most compilers
> miscompile some of the 7 million functions.  llvm-gcc and clang are the
> only ones we tested that actually get them all right.
>
> To compile these functions this code needs to be prepended:
>
>   #include <limits.h>
>   #include <stdint.h>
>   #include "safe_abbrev.h"
>   #include "safe_math.h"
>
> The safe math headers are here:
>
>   http://www.cs.utah.edu/~regehr/safe_math/
>
> Anyway I just throw this out there.  People on this list have told me
> before that missed-optimization bugs are not considered very interesting.
> The ideal result (from my point of view as a compiler consumer) would be
> for a few people from one or more of these compilers' development
> communities to take seriously the job of eliminating these embarrassments.
>
> John Regehr

John Regehr

2009-Jul-18 19:57 UTC

head link

[LLVMdev] speed and code size issues

Hi Nuno,

The "right answer" for each function for a given input is determined
by
voting; any compiler whose output diagrees with the majority is considered 
to be wrong.  So far, there is always a majority that agree on the answer.

Rather than testing, I'd like to use an equivalence checker for object 
code.  We're working on borrowing one, but don't have anything yet. 
Equivalence checkers for object code are still very much at the research 
stage, it seems.  Luckily, for this particular purpose the checker does 
not need to scale to large inputs.

Generating the tests is really easy: DFS with bounded depth over a grammar 
for a C subset.  Then it takes a bit of effort to prune stupid programs, 
such as those that attempt to shift by <0 or >=bitwidth.
> In summary, do you have any paper coming along? :)
We intend to write one but I'm waiting to see what aspects of this turn 
out to be interesting or useful!  Really this is just an offshoot from our 
random testing work where DFS provides an alternative way to drive the 
program generator.

John Regehr

Nuno Lopes

2009-Jul-19 14:09 UTC

head link

[LLVMdev] speed and code size issues

Hi John,

Thanks for the answers. I'm looking forward for the paper ;)

Nuno

----- Original Message -----> Hi Nuno,
>
> The "right answer" for each function for a given input is
determined by
> voting; any compiler whose output diagrees with the majority is considered
> to be wrong.  So far, there is always a majority that agree on the answer.
>
> Rather than testing, I'd like to use an equivalence checker for object
> code.  We're working on borrowing one, but don't have anything yet.
> Equivalence checkers for object code are still very much at the research
> stage, it seems.  Luckily, for this particular purpose the checker does
> not need to scale to large inputs.
>
> Generating the tests is really easy: DFS with bounded depth over a grammar
> for a C subset.  Then it takes a bit of effort to prune stupid programs,
> such as those that attempt to shift by <0 or >=bitwidth.
>
>> In summary, do you have any paper coming along? :)
>
> We intend to write one but I'm waiting to see what aspects of this turn
> out to be interesting or useful!  Really this is just an offshoot from our
> random testing work where DFS provides an alternative way to drive the
> program generator.
>
> John Regehr

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Jul 2009 - [LLVMdev] speed and code size issues

[LLVMdev] speed and code size issues

[LLVMdev] speed and code size issues

[LLVMdev] speed and code size issues

Apparently Analagous Threads