Hi Kenneth,
On Thursday 17 April 2008 20:50:24 Kenneth Hoste wrote:> On 17 Apr 2008, at 20:39, Chris Lattner wrote:
> > On Thu, 17 Apr 2008, HyperQuantum wrote:
> >> I have been wondering why llvm-ld generates the same code with or
> > without the option "-O5" so I looked at its source (llvm
2.2). And
> > apparently, the options "-On" are accepted but never used!
The program
> > runs a fixed set of optimization passes, unless
"-disable-opt" is
> > specified. What is the reason for this? If this is intended, then the
> > documentation should say something about this IMHO.
> >
> > Probably the standard answer: "because noone has done it
yet" :)
>
> I don't want to sound too much like advertising our work, but COLE
> (see the CGO paper at http://www.elis.ugent.be/~kehoste) seems to be
> extremely well suited to figure out which passes should go where for
> llvm-ld... Same goes for llvm-gcc and opt btw :-)
>
> I hope I can find the time sometime soon to work on this...
this sounds great! You should concentrate on opt, since with that
you can schedule any passes you like (often multiple times) in any
order. There should be no difference between using llvm-gcc at some
-O level, and running it at -O0 and using opt to run the passes on the
unoptimized bitcode. Likewise for llvm-ld: all it does is link the
supplied bitcode into one monster bitcode module, run a bunch of passes
on it, then turn the result into object code. You get exactly the same
effect by linking the bitcode without optimization (using llvm-link),
using opt to run the list of passes on that, then turning the resulting
optimized bitcode into object code.
Where llvm-ld differs from llvm-gcc is in the kind of bitcode that
the optimizations need to be run on. The usual procedure is to
run llvm-gcc (with optimization) on each compilation unit, producing
bitcode. Then all the bitcode files for the program are passed to
llvm-ld for link-time optimization. So when running llvm-ld, passes
are run on a module where each function has already been maximally
optimized, as far as local optimizations go. The difference is that
now you have bodies for all functions so inter-procedural passes
(inline, argument promotion, ip constant propagation, elimination
of unused functions etc) become much more relevant and effective.
In short, llvm-gcc and llvm-ld represent two different classes of
bitcode being fed into opt, so presumably different sets of passes
are appropriate.
What would be great is to compute an optimal set of passes and -O
levels for llvm-gcc and llvm-ld. It makes sense to start with llvm-gcc.
Presumably what you would do is take a large body of code (spec etc),
and compile each compilation unit down to bitcode using llvm-gcc at
-O0. These bitcode files would be input to COLE. What COLE would
have to do is run opt on the bitcode with varying selections of passes,
and came up with optimal sets of passes based on various criteria
(execution speed, bitcode size [this one is the easiest to do, maybe
you should start here], etc), which can then be turned into -O levels
for llvm-gcc. I think you can assume that llvm-ld will be run at the
same -O level as llvm-gcc. So once you have llvm-gcc sorted, the next
step would be to compile all the benchmarks using llvm-gcc at -Owhatever,
and llvm-link the result into a mega-monster module. This results in
one module for each benchmark. Then COLE can schedule various passes on
the monster modules to see what is most effective for this link-time
optimization. If you used criteria X,Y and Z to determine a good set of
passes for -Owhatever, probably the same X,Y and Z should be used to
determine the llvm-ld passes at -Owhatever.
Ciao,
Duncan.