thr3ads.net - llvm dev - [LLVMdev] current llvm vs. current gcc code size [Aug 2008]

If this information is useful, please help other people find it:
Share via:

regehr

2008-Aug-21 03:50 UTC

[LLVMdev] current llvm vs. current gcc code size

This is a bit random but perhaps interesting:

Attached is a histogram of code sizes for about 50,000 random C programs 
compiled by recent versions of llvm and gcc.  x-axis is bytes in the 
text segment and y-axis is number of programs in each 100-byte bucket.

Code size for a program is taken to be the smallest code size across 
-O0, -O1, -O2, -O3, and -Os, for a given compiler.

I don't know why each histogram has two peaks, or why llvm's left-hand 
peak is at a lower code size than gcc's left-hand peak.  Perhaps the 
left-hand peaks indicate trivial random programs that compile completely 
away and then llvm has a more concise crt0.

Anyway the interesting effect here is the rightward shift of the main 
bulk of the binaries produced by llvm vs. gcc.  This would seem to 
indicate that there are optimization opportunities being missed by llvm. 
  Ideally, there would be some automatic way to identify these 
opportunities, perhaps using delta-debugging techniques.  If this seems 
interesting to people I can look into it more...

This is targeting x86/Linux.

John
-------------- next part --------------
A non-text attachment was scrubbed...
Name: size.png
Type: image/png
Size: 4094 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080820/1e376bba/attachment.png>

Dale Johannesen

2008-Aug-22 18:27 UTC

head link

[LLVMdev] current llvm vs. current gcc code size

This is interesting.  Nobody has tried very hard to get -Os to work  
well AFAIK, so it's that not surprising it doesn't:)  Aside from  
optimization knobs, I don't think there's anything in x86 instruction  
selection to pick small code sequences, which can be important.  I  
suspect -Os will become important later though.

One thing that might be going on is differing default settings.  These  
are worth checking out:
Frame pointer elimination (small prolog/epilog)?
Relocation model (PIC, dynamic-no-pic, static?)
strict aliasing?

On Aug 20, 2008, at 8:50 PMPDT, regehr wrote:
> This is a bit random but perhaps interesting:
>
> Attached is a histogram of code sizes for about 50,000 random C  
> programs compiled by recent versions of llvm and gcc.  x-axis is  
> bytes in the text segment and y-axis is number of programs in each  
> 100-byte bucket.
>
> Code size for a program is taken to be the smallest code size across  
> -O0, -O1, -O2, -O3, and -Os, for a given compiler.
>
> I don't know why each histogram has two peaks, or why llvm's left- 
> hand peak is at a lower code size than gcc's left-hand peak.   
> Perhaps the left-hand peaks indicate trivial random programs that  
> compile completely away and then llvm has a more concise crt0.
>
> Anyway the interesting effect here is the rightward shift of the  
> main bulk of the binaries produced by llvm vs. gcc.  This would seem  
> to indicate that there are optimization opportunities being missed  
> by llvm.  Ideally, there would be some automatic way to identify  
> these opportunities, perhaps using delta-debugging techniques.  If  
> this seems interesting to people I can look into it more...
>
> This is targeting x86/Linux.
>
> John-------------- next part --------------
A non-text attachment was scrubbed...
Name: size.png
Type: image/png
Size: 4094 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080822/e5bc810d/attachment.png>
-------------- next part --------------> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

John Regehr

2008-Aug-23 19:09 UTC

head link

[LLVMdev] current llvm vs. current gcc code size

On Fri, 22 Aug 2008, Dale Johannesen wrote:
> This is interesting.  Nobody has tried very hard to get -Os to work well 
> AFAIK, so it's that not surprising it doesn't:)  Aside from
optimization
> knobs, I don't think there's anything in x86 instruction selection
to pick
> small code sequences, which can be important.
Yeah.  In my view on x86 -Os should mainly boil down to keeping loop 
unrolling and function inlining under tight control.

Unrolling is pretty easy (don't do it) but inlining is tricky since 
limited inlining can significantly reduce code size, for some inputs.  Of 
course if you go too far there's major bloat.

I find LLVM's inlining heuristics to be pretty ad hoc.  There is an easy, 
more-or-less principled way to minimize code size using inlining, if you 
can see the whole program.  For each function, inline all calls to it if

   (s-o)(c-1) < x

where s is the estimated size of the function, o is the size overhead of 
the calling convention, c is the number of static callsites, and x is a 
magic parameter that tries to account for the fact that often there are 
added benefits to inlining because it enables subsequent intraprocedural 
optimizations.  x can be chosen empirically.  Of course this model of 
code bloat can be extended in any number of ways but it's pretty good 
as-is.

To the extent that LLVM becomes a compiler of choice for embedded 
processors, -Os will be important in the long run.

John

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Aug 2008 - [LLVMdev] current llvm vs. current gcc code size

[LLVMdev] current llvm vs. current gcc code size

[LLVMdev] current llvm vs. current gcc code size

[LLVMdev] current llvm vs. current gcc code size

Maybe Matching Threads