Jack Howarth
2015-May-03 22:02 UTC
[LLVMdev] libiomp, not libgomp as default library linked with -fopenmp
A couple more data points. Current llvm 3.7svn with the two outstanding OPENMP patches can build the openmp support in gdl 0.9.5 (which completely passes its test suite) and apbs 1.4.1's limited openmp support. On Sat, May 2, 2015 at 11:11 PM, Jack Howarth < howarth.mailing.lists at gmail.com> wrote:> On a positive note, current llvm 3.7svn with the two outstanding > OPENMP patches applied builds the openmp support in gromacs 5.0.4 and the > resulting build fully passes the gromacs regression test suite. Tested on > x86_64-apple-darwin14. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150503/b6dd7bb8/attachment.html>
Andrey Bokhanko
2015-May-06 09:41 UTC
[LLVMdev] libiomp, not libgomp as default library linked with -fopenmp
Jack, Thanks you for all the testing efforts! -- they are really appreciated and in my eyes one of the best contributions to the overall OMP development effort. Keep up the good work! Andrey On Mon, May 4, 2015 at 1:02 AM, Jack Howarth <howarth.mailing.lists at gmail.com> wrote:> A couple more data points. Current llvm 3.7svn with the two outstanding > OPENMP patches can build the openmp support in gdl 0.9.5 (which completely > passes its test suite) and apbs 1.4.1's limited openmp support. > > On Sat, May 2, 2015 at 11:11 PM, Jack Howarth > <howarth.mailing.lists at gmail.com> wrote: >> >> On a positive note, current llvm 3.7svn with the two outstanding >> OPENMP patches applied builds the openmp support in gromacs 5.0.4 and the >> resulting build fully passes the gromacs regression test suite. Tested on >> x86_64-apple-darwin14. > >
Jack Howarth
2015-May-06 23:48 UTC
[LLVMdev] libiomp, not libgomp as default library linked with -fopenmp
Andrey, An initial attempt at benchmarking the performance for graphicsmagick 1.3.19 on x86_64-apple-darwin14 built at various optimization levels with openmp support enabled using gcc 5.1.0 or clang svn at r236592 with... http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150504/128555.html http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150504/128561.html http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150504/128567.html produced the following results. gcc 5.1 -O3 % gm benchmark -stepthreads 1 -duration 10 convert -size 2048x1080 pattern:granite -operator all Noise-Gaussian 30% null: Results: 1 threads 14 iter 10.76s user 10.76s total 1.301 iter/s 1.301 iter/cpu 1.00 speedup 1.000 karp-flatt Results: 2 threads 25 iter 19.75s user 10.27s total 2.434 iter/s 1.266 iter/cpu 1.87 speedup 0.069 karp-flatt Results: 3 threads 36 iter 28.74s user 10.04s total 3.586 iter/s 1.253 iter/cpu 2.76 speedup 0.044 karp-flatt Results: 4 threads 48 iter 38.54s user 10.21s total 4.701 iter/s 1.245 iter/cpu 3.61 speedup 0.036 karp-flatt Results: 5 threads 58 iter 46.71s user 10.04s total 5.777 iter/s 1.242 iter/cpu 4.44 speedup 0.032 karp-flatt Results: 6 threads 69 iter 55.76s user 10.14s total 6.805 iter/s 1.237 iter/cpu 5.23 speedup 0.029 karp-flatt Results: 7 threads 78 iter 63.16s user 10.01s total 7.792 iter/s 1.235 iter/cpu 5.99 speedup 0.028 karp-flatt Results: 8 threads 88 iter 71.33s user 10.02s total 8.782 iter/s 1.234 iter/cpu 6.75 speedup 0.026 karp-flatt clang 3.7svn -O3 % gm benchmark -stepthreads 1 -duration 10 convert -size 2048x1080 pattern:granite -operator all Noise-Gaussian 30% null: Results: 1 threads 19 iter 10.42s user 10.41s total 1.825 iter/s 1.823 iter/cpu 1.00 speedup 1.000 karp-flatt Results: 2 threads 36 iter 20.15s user 10.08s total 3.571 iter/s 1.787 iter/cpu 1.96 speedup 0.022 karp-flatt Results: 3 threads 53 iter 30.45s user 10.15s total 5.222 iter/s 1.741 iter/cpu 2.86 speedup 0.024 karp-flatt Results: 4 threads 68 iter 39.96s user 10.00s total 6.800 iter/s 1.702 iter/cpu 3.73 speedup 0.025 karp-flatt Results: 5 threads 83 iter 50.18s user 10.04s total 8.267 iter/s 1.654 iter/cpu 4.53 speedup 0.026 karp-flatt Results: 6 threads 97 iter 59.97s user 10.01s total 9.690 iter/s 1.617 iter/cpu 5.31 speedup 0.026 karp-flatt Results: 7 threads 111 iter 70.37s user 10.06s total 11.034 iter/s 1.577 iter/cpu 6.05 speedup 0.026 karp-flatt Results: 8 threads 124 iter 79.95s user 10.04s total 12.351 iter/s 1.551 iter/cpu 6.77 speedup 0.026 karp-flatt gcc 5.1 -O2 % gm benchmark -stepthreads 1 -duration 10 convert -size 2048x1080 pattern:granite -operator all Noise-Gaussian 30% null: Results: 1 threads 13 iter 10.04s user 10.04s total 1.295 iter/s 1.295 iter/cpu 1.00 speedup 1.000 karp-flatt Results: 2 threads 25 iter 19.86s user 10.32s total 2.422 iter/s 1.259 iter/cpu 1.87 speedup 0.069 karp-flatt Results: 3 threads 36 iter 28.87s user 10.08s total 3.571 iter/s 1.247 iter/cpu 2.76 speedup 0.044 karp-flatt Results: 4 threads 47 iter 37.84s user 10.03s total 4.686 iter/s 1.242 iter/cpu 3.62 speedup 0.035 karp-flatt Results: 5 threads 58 iter 46.84s user 10.09s total 5.748 iter/s 1.238 iter/cpu 4.44 speedup 0.032 karp-flatt Results: 6 threads 68 iter 55.06s user 10.02s total 6.786 iter/s 1.235 iter/cpu 5.24 speedup 0.029 karp-flatt Results: 7 threads 78 iter 63.28s user 10.05s total 7.761 iter/s 1.233 iter/cpu 5.99 speedup 0.028 karp-flatt Results: 8 threads 88 iter 71.48s user 10.02s total 8.782 iter/s 1.231 iter/cpu 6.78 speedup 0.026 karp-flatt clang 3.7svn -O2 % gm benchmark -stepthreads 1 -duration 10 convert -size 2048x1080 pattern:granite -operator all Noise-Gaussian 30% null: Results: 1 threads 19 iter 10.36s user 10.35s total 1.836 iter/s 1.834 iter/cpu 1.00 speedup 1.000 karp-flatt Results: 2 threads 32 iter 20.63s user 10.31s total 3.104 iter/s 1.551 iter/cpu 1.69 speedup 0.183 karp-flatt Results: 3 threads 46 iter 30.29s user 10.10s total 4.554 iter/s 1.519 iter/cpu 2.48 speedup 0.105 karp-flatt Results: 4 threads 60 iter 40.36s user 10.09s total 5.946 iter/s 1.487 iter/cpu 3.24 speedup 0.078 karp-flatt Results: 5 threads 73 iter 50.25s user 10.05s total 7.264 iter/s 1.453 iter/cpu 3.96 speedup 0.066 karp-flatt Results: 6 threads 86 iter 60.44s user 10.08s total 8.532 iter/s 1.423 iter/cpu 4.65 speedup 0.058 karp-flatt Results: 7 threads 98 iter 70.47s user 10.08s total 9.722 iter/s 1.391 iter/cpu 5.30 speedup 0.054 karp-flatt Results: 8 threads 109 iter 79.59s user 10.02s total 10.878 iter/s 1.370 iter/cpu 5.93 speedup 0.050 karp-flatt gcc 5.1 -Os % gm benchmark -stepthreads 1 -duration 10 convert -size 2048x1080 pattern:granite -operator all Noise-Gaussian 30% null: Results: 1 threads 12 iter 10.29s user 10.29s total 1.166 iter/s 1.166 iter/cpu 1.00 speedup 1.000 karp-flatt Results: 2 threads 23 iter 19.56s user 10.00s total 2.300 iter/s 1.176 iter/cpu 1.97 speedup 0.014 karp-flatt Results: 3 threads 35 iter 29.68s user 10.27s total 3.408 iter/s 1.179 iter/cpu 2.92 speedup 0.013 karp-flatt Results: 4 threads 45 iter 38.14s user 10.04s total 4.482 iter/s 1.180 iter/cpu 3.84 speedup 0.014 karp-flatt Results: 5 threads 56 iter 47.43s user 10.11s total 5.539 iter/s 1.181 iter/cpu 4.75 speedup 0.013 karp-flatt Results: 6 threads 66 iter 55.89s user 10.06s total 6.561 iter/s 1.181 iter/cpu 5.63 speedup 0.013 karp-flatt Results: 7 threads 76 iter 64.39s user 10.11s total 7.517 iter/s 1.180 iter/cpu 6.45 speedup 0.014 karp-flatt Results: 8 threads 86 iter 72.90s user 10.11s total 8.506 iter/s 1.180 iter/cpu 7.29 speedup 0.014 karp-flatt clang 3.7svn -Os % gm benchmark -stepthreads 1 -duration 10 convert -size 2048x1080 pattern:granite -operator all Noise-Gaussian 30% null: Results: 1 threads 19 iter 10.36s user 10.36s total 1.834 iter/s 1.834 iter/cpu 1.00 speedup 1.000 karp-flatt Results: 2 threads 36 iter 20.50s user 10.25s total 3.512 iter/s 1.756 iter/cpu 1.92 speedup 0.044 karp-flatt Results: 3 threads 52 iter 30.30s user 10.11s total 5.143 iter/s 1.716 iter/cpu 2.80 speedup 0.035 karp-flatt Results: 4 threads 67 iter 40.12s user 10.03s total 6.680 iter/s 1.670 iter/cpu 3.64 speedup 0.033 karp-flatt Results: 5 threads 82 iter 50.25s user 10.06s total 8.151 iter/s 1.632 iter/cpu 4.44 speedup 0.031 karp-flatt Results: 6 threads 96 iter 60.23s user 10.04s total 9.562 iter/s 1.594 iter/cpu 5.21 speedup 0.030 karp-flatt Results: 7 threads 109 iter 70.12s user 10.03s total 10.867 iter/s 1.554 iter/cpu 5.93 speedup 0.030 karp-flatt Results: 8 threads 122 iter 79.82s user 10.03s total 12.164 iter/s 1.528 iter/cpu 6.63 speedup 0.029 karp-flatt as described in http://www.graphicsmagick.org/OpenMP.html. The interpretation of the results seem complex as the optimal results would be a combination of the highest iter/cpu as well as the highest speedup. The results for clang 3.7svn are clearly superior to gcc 5.1 on both metrics for -O3. For -O2 and -Os, the performance (iter/cpu) is always higher for clang 3.7svn but not the speedup compared to gcc 5.1. Jack On Wed, May 6, 2015 at 5:41 AM, Andrey Bokhanko <andreybokhanko at gmail.com> wrote:> Jack, > > Thanks you for all the testing efforts! -- they are really appreciated > and in my eyes one of the best contributions to the overall OMP > development effort. > > Keep up the good work! > > Andrey > > > On Mon, May 4, 2015 at 1:02 AM, Jack Howarth > <howarth.mailing.lists at gmail.com> wrote: > > A couple more data points. Current llvm 3.7svn with the two outstanding > > OPENMP patches can build the openmp support in gdl 0.9.5 (which > completely > > passes its test suite) and apbs 1.4.1's limited openmp support. > > > > On Sat, May 2, 2015 at 11:11 PM, Jack Howarth > > <howarth.mailing.lists at gmail.com> wrote: > >> > >> On a positive note, current llvm 3.7svn with the two outstanding > >> OPENMP patches applied builds the openmp support in gromacs 5.0.4 and > the > >> resulting build fully passes the gromacs regression test suite. Tested > on > >> x86_64-apple-darwin14. > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150506/ebf51a5d/attachment.html>