Maho NAKATA
2010-Apr-12 04:37 UTC
Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
Hi FreeBSD developers, [the original article in Japanese can be found at http://blog.goo.ne.jp/nakatamaho/e/b5f6fbc3cc6e1ac4947463eb1ca4eb0a ] *Abstract* I compared the peak performance of FreeBSD 8.0/amd64 and Ubuntu 9.10 amd64 using dgemm (a linear algebra routine, matrix-matrix multiplication). I obtained only 70% of theoretical peak performance on FreeBSD 8/amd64 and almost 95% on Ubuntu 9.10 /amd64. I'm really disappointed. *Introduction* I'm a friend of Gotoh Kazushige, the principal developers of GotoBLAS. He told me that FreeBSD is not suitable OS for scientific computing or high performance computing. He says (in Japanese and my translation):> I guess FreeBSD does page coloring, but I don't think FreeBSD considers very large cache > size which recent CPU has. Support of a very large cache on Linux is still not very will > sophisticated, but on *BSDs, its worst; they uses too fine memory allocation method, > so we cannot expect large continuous physical memory allocation. > Moreover, process scheduling is not so nice as *BSD employs an algorithm that > changes physical CPUs in turn instead of allocating one core for such kind of jobs. > Take your own benchmark, and you'll see..*Result* Machine: Core i7 920 (42.56-44.8Gflops) / DDR3 1066 OS: FreeBSD 8.0/amd64 and Ubuntu 9.10 GotoBLAS2: 1.13 dgemm result OS : FLOPS : percent in peak FreeBSD : 32.0 GFlops : 71% Ubuntu : 42.0-42.7GFlops : 93.8%-95.3% Thanks, -- Nakata Maho http://accc.riken.jp/maho/ , http://ja.openoffice.org/ Nakata Maho's PGP public keys: http://accc.riken.jp/maho/maho.pgp.txt
Garrett Cooper
2010-Apr-12 04:47 UTC
Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
On Sun, Apr 11, 2010 at 9:12 PM, Maho NAKATA <chat95@mac.com> wrote:> Hi FreeBSD developers, > [the original article in Japanese can be found at > http://blog.goo.ne.jp/nakatamaho/e/b5f6fbc3cc6e1ac4947463eb1ca4eb0a ] > > *Abstract* > I compared the peak performance of FreeBSD 8.0/amd64 and Ubuntu 9.10 amd64 using dgemm > (a linear algebra routine, matrix-matrix multiplication). > I obtained only 70% of theoretical peak performance on FreeBSD 8/amd64 and > almost 95% on Ubuntu 9.10 /amd64. I'm really disappointed. > > *Introduction* > I'm a friend of Gotoh Kazushige, the principal developers of GotoBLAS. He told me that > FreeBSD is not suitable OS for scientific computing or high performance computing. He says > (in Japanese and my translation): > >> I guess FreeBSD does page coloring, but I don't think FreeBSD considers very large cache >> size which recent CPU has. Support of a very large cache on Linux is still not very will >> sophisticated, but on *BSDs, its worst; they uses too fine memory allocation method, >> so we cannot expect large continuous physical memory allocation. >> Moreover, process scheduling is not so nice as *BSD employs an algorithm that >> changes physical CPUs in turn instead of allocating one core for such kind of jobs. >> Take your own benchmark, and you'll see.. > > *Result* > Machine: Core i7 920 (42.56-44.8Gflops) / DDR3 1066 > OS: FreeBSD 8.0/amd64 and Ubuntu 9.10 > GotoBLAS2: 1.13 > > dgemm result > OS ? ? ?: FLOPS ? ? ? ? ? : percent in peak > FreeBSD : 32.0 GFlops ? ? : 71% > Ubuntu ?: 42.0-42.7GFlops : 93.8%-95.3%I'm not sure if this is the exact issue, but it might be a point of reference worth investigating: http://lists.freebsd.org/pipermail/freebsd-hackers/2010-March/031004.html Thanks, -Garrett
Bruce Simpson
2010-Apr-12 09:49 UTC
Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
On 04/12/10 05:12, Maho NAKATA wrote:> *Abstract* > I compared the peak performance of FreeBSD 8.0/amd64 and Ubuntu 9.10 amd64 using dgemm > (a linear algebra routine, matrix-matrix multiplication). > I obtained only 70% of theoretical peak performance on FreeBSD 8/amd64 and > almost 95% on Ubuntu 9.10 /amd64. I'm really disappointed. >So, where's the profiling to discover why this is the case? Also I'm not clear on what constitutes 'theoretical peak performance' here or how it is being calculated. So figures like these come across as unscientific. I'm sure this is something which can be resolved if someone sits down, profiles the app, and makes the necessary adjustments (e.g. pthread_setaffinity_np()) to configure CPU affinity, if the lack of it is pessimizing your friend's app. The PMC framework is rapidly maturing, and you can use KCacheGrind with it to visualize context switch overhead. But I think it's expecting a bit much to post informal results to -stable, in an expectation of something other thaninformal suggestions of what may help someone's maths-intensive application. If there are performance issues, then reproducible results are needed, as well as some basic profiling effort of the system elements involved, before people could say anything either way, or offer further help. cheers, BMS
Adrian Chadd
2010-Apr-12 10:00 UTC
Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
Of course, what would be helpful is actually figuring out what is going on rather than some conjecture. :) With what he said, tweaking memory allocation under FreeBSD and/or linux would change the performance characteristics and either validate or disprove his assumptions? Adrian On 12 April 2010 12:12, Maho NAKATA <chat95@mac.com> wrote:> Hi FreeBSD developers, > [the original article in Japanese can be found at > http://blog.goo.ne.jp/nakatamaho/e/b5f6fbc3cc6e1ac4947463eb1ca4eb0a ] > > *Abstract* > I compared the peak performance of FreeBSD 8.0/amd64 and Ubuntu 9.10 amd64 using dgemm > (a linear algebra routine, matrix-matrix multiplication). > I obtained only 70% of theoretical peak performance on FreeBSD 8/amd64 and > almost 95% on Ubuntu 9.10 /amd64. I'm really disappointed. > > *Introduction* > I'm a friend of Gotoh Kazushige, the principal developers of GotoBLAS. He told me that > FreeBSD is not suitable OS for scientific computing or high performance computing. He says > (in Japanese and my translation): > >> I guess FreeBSD does page coloring, but I don't think FreeBSD considers very large cache >> size which recent CPU has. Support of a very large cache on Linux is still not very will >> sophisticated, but on *BSDs, its worst; they uses too fine memory allocation method, >> so we cannot expect large continuous physical memory allocation. >> Moreover, process scheduling is not so nice as *BSD employs an algorithm that >> changes physical CPUs in turn instead of allocating one core for such kind of jobs. >> Take your own benchmark, and you'll see.. > > *Result* > Machine: Core i7 920 (42.56-44.8Gflops) / DDR3 1066 > OS: FreeBSD 8.0/amd64 and Ubuntu 9.10 > GotoBLAS2: 1.13 > > dgemm result > OS ? ? ?: FLOPS ? ? ? ? ? : percent in peak > FreeBSD : 32.0 GFlops ? ? : 71% > Ubuntu ?: 42.0-42.7GFlops : 93.8%-95.3% > > Thanks, > -- Nakata Maho http://accc.riken.jp/maho/ , http://ja.openoffice.org/ > ? Nakata Maho's PGP public keys: http://accc.riken.jp/maho/maho.pgp.txt > > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >
Andriy Gapon
2010-Apr-12 14:41 UTC
Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
on 12/04/2010 07:12 Maho NAKATA said the following:> Hi FreeBSD developers, > [the original article in Japanese can be found at > http://blog.goo.ne.jp/nakatamaho/e/b5f6fbc3cc6e1ac4947463eb1ca4eb0a ] > > *Abstract* > I compared the peak performance of FreeBSD 8.0/amd64 and Ubuntu 9.10 amd64 using dgemm > (a linear algebra routine, matrix-matrix multiplication). > I obtained only 70% of theoretical peak performance on FreeBSD 8/amd64 and > almost 95% on Ubuntu 9.10 /amd64. I'm really disappointed.Sorry about that, but more important question (for us) is: are you willing to help us improve in addition to reporting your results?> *Introduction* > I'm a friend of Gotoh Kazushige, the principal developers of GotoBLAS. He told me that > FreeBSD is not suitable OS for scientific computing or high performance computing. He says > (in Japanese and my translation): > >> I guess FreeBSD does page coloring, but I don't think FreeBSD considers very large cache >> size which recent CPU has.AFAIK, recent FreeBSD doesn't use page coloring anymore.>> Support of a very large cache on Linux is still not very will >> sophisticated, but on *BSDs, its worst; they uses too fine memory allocation method, >> so we cannot expect large continuous physical memory allocation.Can your friend provide more explanation about these points in technical terms? E.g. what kind of support, in his opinion, is needed for very large caches? Why, in his opinion, the memory needs to be physically contiguous? Perhaps, he talks about support of large pages (2M) and related improvements in TLB performance. If so, he (and you) may read about 'superpages' feature of FreeBSD. I am not sure if it is enabled by default in 8.0, you can check vm.pmap.pg_ps_enabled.>> Moreover, process scheduling is not so nice as *BSD employs an algorithm that >> changes physical CPUs in turn instead of allocating one core for such kind of jobs. >> Take your own benchmark, and you'll see..Here I can only add an anecdotal 'me too'. Sometimes I run single-threaded high-cpu programs like ffmpeg transcoding on otherwise idle system (a bunch of system daemons in background). And I see that the cpu-consuming process frequently goes back and forth between my two cores. CPU user loads on the cores are something like 60% vs 40%. My expectations were that the process would mostly run on one core while the rest of the threads would mostly run on the other. I am not sure if that core switching really hurts performance and if there is something wrong about it. But somehow it seems 'counter-intuitive'.> *Result* > Machine: Core i7 920 (42.56-44.8Gflops) / DDR3 1066 > OS: FreeBSD 8.0/amd64 and Ubuntu 9.10 > GotoBLAS2: 1.13 > > dgemm result > OS : FLOPS : percent in peak > FreeBSD : 32.0 GFlops : 71% > Ubuntu : 42.0-42.7GFlops : 93.8%-95.3%It would also be get good to learn more about your program. How much memory does it typically use, how does it allocate it? Is it single-threaded or not? If not, how many threads does it have and what do they do, how do they communicate? -- Andriy Gapon
Andriy Gapon
2010-Apr-12 14:49 UTC
Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
on 12/04/2010 17:41 Andriy Gapon said the following:> It would also be get good to learn more about your program. > How much memory does it typically use, how does it allocate it? > Is it single-threaded or not? If not, how many threads does it have and what do > they do, how do they communicate?Another question is what compilers (what versions of GCC) were used on both system to compile the program? -- Andriy Gapon
Alan Cox
2010-Apr-13 05:30 UTC
Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
On Sun, Apr 11, 2010 at 11:12 PM, Maho NAKATA <chat95@mac.com> wrote:> Hi FreeBSD developers, > [the original article in Japanese can be found at > http://blog.goo.ne.jp/nakatamaho/e/b5f6fbc3cc6e1ac4947463eb1ca4eb0a ] > > *Abstract* > I compared the peak performance of FreeBSD 8.0/amd64 and Ubuntu 9.10 amd64 > using dgemm > (a linear algebra routine, matrix-matrix multiplication). > I obtained only 70% of theoretical peak performance on FreeBSD 8/amd64 and > almost 95% on Ubuntu 9.10 /amd64. I'm really disappointed. > > *Introduction* > I'm a friend of Gotoh Kazushige, the principal developers of GotoBLAS. He > told me that > FreeBSD is not suitable OS for scientific computing or high performance > computing. He says > (in Japanese and my translation): > > > I guess FreeBSD does page coloring, but I don't think FreeBSD considers > very large cache > > size which recent CPU has. Support of a very large cache on Linux is > still not very will > > sophisticated, but on *BSDs, its worst; they uses too fine memory > allocation method, > > so we cannot expect large continuous physical memory allocation. >These statements about FreeBSD's memory management are wrong, or at least outdated. FreeBSD is very likely to allocate physical memory in contiguous chunks to your memory-hungry application even if automatic superpage promotion does not occur. You should refer your friend to my paper at http://www.usenix.org/events/osdi02/tech/full_papers/navarro/navarro_html/and tell him that FreeBSD >= 7.2 implements a variation on what that paper describes. Regards, Alan
Pieter de Goeje
2010-Apr-14 14:17 UTC
How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
On Wednesday 14 April 2010 15:19:13 Andriy Gapon wrote:> on 14/04/2010 02:21 Maho NAKATA said the following: > > 2. install ports/math/gotoblas (manual download required) > > make install > > Do you know how gotoblas on Linux was obtained? > Was it built from source? > Has it come pre-packaged? > If so, can you find out details of its build configuration? > > Thanks!I think the best test would be to run a statically compiled linux binary on FreeBSD. That way the compiler settings are exactly the same. - Pieter