On Wed, Oct 12, 2011 at 09:40:53AM +0200, Duncan Sands wrote:> Hi Chris, > >>> PS: With -fplugin-arg-dragonegg-enable-gcc-optzns the LLVM optimizers are run at >>> the following levels: >>> >>> Command line option LLVM optimizers run at >>> ------------------- ---------------------- >>> -O1 tiny amount of optimization >>> -O2 or -O3 -O1 >>> -O4 or -O5 -O2 >>> -O6 or better -O3 >> >> Hi Duncan, >> >> Out of curiosity, why do you follow this approach? People generally use -O2 or -O3. I'd recommend switching dragonegg to line those up with whatever you want people to use. > > note that this is done only when the GCC optimizers are run. The basic > observation is that running the LLVM optimizers at -O3 after running the > GCC optimizers (at -O3) results in slower code! I mean slower than what > you get by running the LLVM optimizers at -O1 or -O2. I didn't find time > to analyse this curiosity yet. It might simply be that the LLVM inlining > level is too high given that inlining has already been done by GCC. Anyway, > I didn't want to run LLVM at -O3 because of this. The next question was: > what is better: LLVM at -O1 or at -O2? My first experiments showed that > code quality was essentially the same. Since at -O1 you get a nice compile > time speedup, I settled on using -O1. Also -O1 makes some sense if the GCC > optimizers did a good job and all that is needed is to clean up the mess that > converting to LLVM IR can produce. However later experiments showed that -O2 > does seem to consistently result in slightly better code, so I've been thinking > of using -O2 instead. This is one reason I encouraged Jack to use -O4 in his > benchmarks (i.e. GCC at -O3, LLVM at -O2) - to see if they show the same thing.Duncan, My preliminary runs of the pb05 benchmarks at -O4, -O5 and -O6 using -fplugin-arg-dragonegg-enable-gcc-optzns didn't show any significant run time performance changes compared to -fplugin-arg-dragonegg-enable-gcc-optzns -O3. I'll rerun those and post the tabulated results this weekend. I am using -ffast-math -funroll-loops as well in the optimization flags. Perhaps I should repeat the benchmarks without those flags. IMHO, the more important thing is to fish out the remaining regressions in the llvm vectorization code by defaulting -fplugin-arg-dragonegg-enable-gcc-optzns on in dragonegg svn once llvm 3.0 has branched. Hopefully this will get us wider testing of the llvm vectorization support and some additional smaller test cases that expose the remaining bugs in that code. Jack> > Ciao, Duncan. > > PS: Dragonegg is a nice platform for understanding what the GCC optimizers > do better than LLVM. It's a pity no-one seems to have used it for this.
Hi Jack,> IMHO, the more important thing is to fish out the remaining regressions > in the llvm vectorization code by defaulting -fplugin-arg-dragonegg-enable-gcc-optzns > on in dragonegg svn once llvm 3.0 has branched. Hopefully this will get us wider > testing of the llvm vectorization support and some additional smaller test cases > that expose the remaining bugs in that code.turning on the GCC optimizers by default essentially means giving up on the LLVM IR optimizers: one way of reading your benchmark results is that the LLVM IR optimizers don't do anything useful that the GCC optimizers haven't done already. The fact that LLVM -O3 and -O2 don't produce better code than -O1 suggests that all that is needed is a little bit of optimization to clean up the inevitable messy bits produced by the gimple -> LLVM IR conversion, but that otherwise GCC already did all the interesting transforms. Should this be considered an LLVM bug or a dragonegg feature? An LLVM bug: if the GCC optimizers work better than LLVM's then LLVM should be improved until LLVM's are better. Turning on the GCC optimizers by default just hides the weaknesses of LLVM's optimizers, and reduces the pressure to improve things. A dragonegg feature: users want their code to run fast. Turning on the GCC optimizers results in faster code, ergo the GCC optimizers should be turned on by default. That way you get faster compile times and fast code. I have some sympathy for both viewpoints... Ciao, Duncan.
On Thu, Oct 13, 2011 at 02:37:54PM +0200, Duncan Sands wrote:> Hi Jack, > >> IMHO, the more important thing is to fish out the remaining regressions >> in the llvm vectorization code by defaulting -fplugin-arg-dragonegg-enable-gcc-optzns >> on in dragonegg svn once llvm 3.0 has branched. Hopefully this will get us wider >> testing of the llvm vectorization support and some additional smaller test cases >> that expose the remaining bugs in that code. > > turning on the GCC optimizers by default essentially means giving up on the LLVM > IR optimizers: one way of reading your benchmark results is that the LLVM IR > optimizers don't do anything useful that the GCC optimizers haven't done > already. The fact that LLVM -O3 and -O2 don't produce better code than -O1 > suggests that all that is needed is a little bit of optimization to clean up > the inevitable messy bits produced by the gimple -> LLVM IR conversion, but > that otherwise GCC already did all the interesting transforms. Should this be > considered an LLVM bug or a dragonegg feature? > > An LLVM bug: if the GCC optimizers work better than LLVM's then LLVM should be > improved until LLVM's are better. Turning on the GCC optimizers by default just > hides the weaknesses of LLVM's optimizers, and reduces the pressure to improve > things. > > A dragonegg feature: users want their code to run fast. Turning on the GCC > optimizers results in faster code, ergo the GCC optimizers should be turned > on by default. That way you get faster compile times and fast code.Duncan, My main concern is that we test the vectorization support in llvm as hard as possible post llvm 3.0. Considering that llvm is unlikely to get autovectorization support in the near term, it seems that FSF gcc/dragonegg is the best approach to hunt for vectorization issues in llvm. Might we be able to split the difference here and create a variant of -fplugin-arg-dragonegg-enable-gcc-optzns which only enables a limited set of FSF gcc optimizations (like -ftree-vectorize) required to enable FSF gcc's autovectorization under dragonegg? For instance couldn't dragonegg just honor -ftree-vectorize when it or -O3 are passed as compiler flags? Jack> > I have some sympathy for both viewpoints... > > Ciao, Duncan.