On Tue, Apr 03, 2012 at 08:33:33AM -0500, Hal Finkel
wrote:> On Tue, 3 Apr 2012 08:57:51 -0400
> Jack Howarth <howarth at bromo.med.uc.edu> wrote:
>
> > On Tue, Apr 03, 2012 at 09:26:38AM +0200, Duncan Sands wrote:
> > > Hi Jack,
> > >
> > >> Attached are the Polyhedron 2005 benchmark results for
current
> > >> llvm/dragonegg svn on x86_64-apple-darwin11 built against
Xcode
> > >> 4.3.2 and FSF gcc 4.6.3.
> > >
> > > thanks for the numbers. How does this compare to LLVM 3.0 - were
> > > there any regressions?
> >
> > The results from just before llvm/dragonegg 3.0 was released are at...
> >
> > http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html
> >
> > It does look as if the ac benchmark has been regressed from 10.80 sec
> > in llvm/dragonegg 3.0 to 12.45 sec in llvm/dragonegg 3.1. These are
> > slightly different FSF gcc 4.6 releases (4.6.2svn vs 4.6.3 but I would
> > be shocked if that was the origin of the performance regression).
> > The results for -fplugin-arg-dragonegg-enable-gcc-optzns
doesn't
> > seem much improved in llvm 3.1 so I assume this means little progress
> > was made in eliminating the scalarization of vectorizations in this
> > release. Did we even get any code added to llvm that would allow us
> > to identify instances of these scalarizations through a compiler
> > warning? Also, the current
> > -fplugin-arg-dragonegg-llvm-option=-vectorize option seems to do
> > almost nothing in terms of vectorization. Do we need to pass any
> > additional flags to actually achieve autovectorization via llvm
>
> Currently, we only have basic-block vectorization, so to get
> autovectorization of loops (which is probably what we want here), the
> loops need to be unrolled. I see that all categories include
> -funroll-loops, does that do anything if we're not using gcc's
> optimizations?
>
> I generally run with both -unroll-allow-partial and -unroll-runtime so
> that llvm's unroller will do as much as it can. Also, in many of these
> cases, it looks like the vectorization is doing *something*, just not
> anything overly helpful ;) -vectorize is new, so it is helpful to
> get feedback on what is actually useful.
>
> You might try including -bb-vectorize-aligned-only (sse3 does not
> actually have unaligned load/stores, right?). Other things to try
> include -bb-vectorize-no-ints (determining when to vectorize integer
> ops may be trickier than floating-point ops) and setting the required
> chain depth to something less than the current default of 6 (for
> example, -bb-vectorize-req-chain-depth=3) will cause a lot more
> vectorization.
So these need to be passed on their own instances of
-fplugin-arg-dragonegg-llvm-optionI guess. I'll try...
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-llvm-option=-vectorize
-fplugin-arg-dragonegg-llvm-option=-unroll-allow-partial
-fplugin-arg-dragonegg-llvm-option=-unroll-runtime
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-aligned-only
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-no-ints %n.f90 -o %n
Unfortunately it doesn't seem that dragonegg can currently parse something
like...
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-req-chain-depth=3
% de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-llvm-option=-vectorize
-fplugin-arg-dragonegg-llvm-option=-unroll-allow-partial
-fplugin-arg-dragonegg-llvm-option=-unroll-runtime
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-aligned-only
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-no-ints
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-req-chain-depth=3 ac.f90 -o ac
f951: error: malformed option
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-req-chain-depth=3 (multiple
'=' signs)
Duncan, any idea how to work around that for passing
-bb-vectorize-req-chain-depth=3?
Jack
>
> -Hal
>
> (in
> > absence of -ftree-vectorize and
> > -fplugin-arg-dragonegg-enable-gcc-optzns)? Jack
> >
> > >
> > > Ciao, Duncan.
> > >
> > > The benchmarks
> > >> for -msse3 and -msse4 appear identical (at least for
degg+optnz).
> > >> This is fortunate since there seems to be a bug in -msse4 on
2.33
> > >> GHz (T7600) Intel Core 2 Duo Merom
> > >> (http://llvm.org/bugs/show_bug.cgi?id=12434). Jack
> > >>
> > >> llvm/dragonegg r153877
> > >>
> > >> dragonegg:
> > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o
%n
> > >>
> > >> degg+vectorize:
> > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
> > >> -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n
> > >>
> > >> degg+optnz:
> > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
> > >> -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
> > >>
> > >> gfortran:
> > >> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90
-o %n
> > >>
> > >> Ave Run (secs)
> > >> dragonegg degg+vectorize degg+optnz gfortran
> > >> ac 12.45 12.45 8.85 8.80
> > >> aermod 16.15 16.05 14.80 17.48
> > >> air 7.10 7.11 6.46 5.50
> > >> capacita 40.00 39.96 37.72 32.62
> > >> channel 2.16 2.15 1.99 1.84
> > >> doduc 29.13 28.41 27.48 26.74
> > >> fatigue 8.75 9.03 8.11 8.44
> > >> gas_dyn 11.72 11.80 4.47 4.26
> > >> induct 24.02 24.91 12.08 13.65
> > >> linpk 15.40 15.78 15.74 15.45
> > >> mdbx 11.80 12.22 11.86 11.20
> > >> nf 28.45 28.50 29.25 27.91
> > >> protein 38.15 39.26 37.87 32.49
> > >> rnflow 32.25 32.35 26.47 24.06
> > >> test_fpu 11.34 11.35 9.31 8.04
> > >> tftt 1.91 1.92 1.93 1.87
> > >>
> > >> Geometric Mean 13.50 13.62 11.34 10.87
> > >>
> > >> Compile (secs)
> > >> dragonegg degg+vectorize degg+optnz gfortran
> > >> ac 0.33 0.38 0.72 1.27
> > >> aermod 25.91 27.58 32.34 43.91
> > >> air 1.07 1.25 1.52 2.25
> > >> capacita 0.49 0.52 0.89 1.71
> > >> channel 0.29 0.36 0.50 0.62
> > >> doduc 1.71 4.50 3.25 5.34
> > >> fatigue 0.84 0.97 1.19 1.76
> > >> gas_dyn 0.67 0.68 1.20 3.02
> > >> induct 1.60 2.14 2.82 3.99
> > >> linpk 0.22 0.24 0.47 0.78
> > >> mdbx 0.63 0.77 1.16 1.85
> > >> nf 0.37 0.40 0.70 1.66
> > >> protein 0.93 1.02 1.75 4.01
> > >> rnflow 1.20 1.25 2.63 5.44
> > >> test_fpu 0.88 0.92 2.13 4.39
> > >> tftt 0.21 0.24 0.34 0.56
> > >>
> > >> Executable (bytes)
> > >> dragonegg degg+vectorize degg+optnz
gfortran
> > >> ac 26856 26856 39120 50968
> > >> aermod 1043700 1055988 1046288 1265640
> > >> air 62004 62004 53740 73988
> > >> capacita 41416 41416 45552 73896
> > >> channel 22808 22808 26768 34784
> > >> doduc 128448 128448 136996 197240
> > >> fatigue 69824 69824 69840 86080
> > >> gas_dyn 59112 59112 67416 119744
> > >> induct 163152 167248 167344 174976
> > >> linpk 18752 18752 27056 38648
> > >> mdbx 53692 53692 57884 82112
> > >> nf 23960 23960 32104 71800
> > >> protein 75032 75032 87208 132040
> > >> rnflow 71896 71896 96632 181120
> > >> test_fpu 54272 54272 78776 155072
> > >> tftt 18640 18640 18488 30768
> > >>
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> --
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory