The release of a new code generator in Mono 2.2 prompted me to benchmark the performance of various VMs using the SciMark2 benchmark on an 8x 2.1GHz 64-bit Opteron and I have published the results here: http://flyingfrogblog.blogspot.com/2009/01/mono-22.html The LLVM results were generated using llvm-gcc 4.2.1 on the C version of SciMark2 with the following command-line options: llvm-gcc -Wall -lm -O2 -funroll-loops *.c -o scimark2 Mono was up to 12x slower than LLVM before and is now only 2.2x slower on average. Interestingly, the JVM scores slightly higher than LLVM on this benchmark on average and beats LLVM on two of the five individual tests. The individual scores are particularly enlightening. Specifically: . LLVM outperforms all other VMs by a significant margin on FFT, Monte Carlo and sparse matrix multiply. . LLVM is beaten by the JVM on successive over-relaxation (SOR) and LU decomposition. In the context of the SOR test, I suspect the JVM is using alias information to perform optimizations that LLVM and llvm-gcc probably do not do. I am not sure what causes the performance discrepancy on LU. Perhaps the JVM is generating SSE instructions. Does llvm-gcc generate SSE instructions under any circumstances? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
----- Original Message ----- From: "Jon Harrop" <jon at ffconsultancy.com> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> Sent: Saturday, January 31, 2009 6:56 AM Subject: [LLVMdev] Performance vs other VMs> > The release of a new code generator in Mono 2.2 prompted me to benchmark > the > performance of various VMs using the SciMark2 benchmark on an 8x 2.1GHz > 64-bit Opteron and I have published the results here: > > http://flyingfrogblog.blogspot.com/2009/01/mono-22.html > > The LLVM results were generated using llvm-gcc 4.2.1 on the C version of > SciMark2 with the following command-line options: > > llvm-gcc -Wall -lm -O2 -funroll-loops *.c -o scimark2 > > Mono was up to 12x slower than LLVM before and is now only 2.2x slower on > average. Interestingly, the JVM scores slightly higher than LLVM on this > benchmark on average and beats LLVM on two of the five individual tests. > > The individual scores are particularly enlightening. Specifically: > > . LLVM outperforms all other VMs by a significant margin on FFT, Monte > Carlo > and sparse matrix multiply. > > . LLVM is beaten by the JVM on successive over-relaxation (SOR) and LU > decomposition. > > In the context of the SOR test, I suspect the JVM is using alias > information > to perform optimizations that LLVM and llvm-gcc probably do not do. > > I am not sure what causes the performance discrepancy on LU. Perhaps the > JVM > is generating SSE instructions. Does llvm-gcc generate SSE instructions > under > any circumstances? >interesting, but can you add plain C compiled with the good old-fashined GCC or similar to serve as a point of reference as well?...> -- > Dr Jon Harrop, Flying Frog Consultancy Ltd. > http://www.ffconsultancy.com/?e > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On Saturday 31 January 2009 02:17:31 BGB wrote:> interesting, but can you add plain C compiled with the good old-fashined > GCC or similar to serve as a point of reference as well?...This is the highest composite score I have been able to get with gcc 4.3.2: $ gcc -Wall -lm -O3 -march=barcelona -funroll-all-loops *.c -o scimark2 $ ./scimark2 Composite Score: 708.63 FFT Mflops: 573.76 (N=1024) SOR Mflops: 481.74 (100 x 100) MonteCarlo: Mflops: 129.06 Sparse matmult Mflops: 775.57 (N=1000, nz=5000) LU Mflops: 1583.00 (M=100, N=100) One reason is, perhaps, that the version of llvm-gcc that I am using does not recognise -march=barcelona for this CPU but gcc does. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
This is not a quite fair comparison. Other virtual machines must be doing garbage collection, while LLVM, as it is using C code, it is taking advantage of memory allocation by hand. On Fri, Jan 30, 2009 at 9:56 PM, Jon Harrop <jon at ffconsultancy.com> wrote:> > The release of a new code generator in Mono 2.2 prompted me to benchmark the > performance of various VMs using the SciMark2 benchmark on an 8x 2.1GHz > 64-bit Opteron and I have published the results here: > > http://flyingfrogblog.blogspot.com/2009/01/mono-22.html > > The LLVM results were generated using llvm-gcc 4.2.1 on the C version of > SciMark2 with the following command-line options: > > llvm-gcc -Wall -lm -O2 -funroll-loops *.c -o scimark2 > > Mono was up to 12x slower than LLVM before and is now only 2.2x slower on > average. Interestingly, the JVM scores slightly higher than LLVM on this > benchmark on average and beats LLVM on two of the five individual tests. > > The individual scores are particularly enlightening. Specifically: > > . LLVM outperforms all other VMs by a significant margin on FFT, Monte Carlo > and sparse matrix multiply. > > . LLVM is beaten by the JVM on successive over-relaxation (SOR) and LU > decomposition. > > In the context of the SOR test, I suspect the JVM is using alias information > to perform optimizations that LLVM and llvm-gcc probably do not do. > > I am not sure what causes the performance discrepancy on LU. Perhaps the JVM > is generating SSE instructions. Does llvm-gcc generate SSE instructions under > any circumstances? > > -- > Dr Jon Harrop, Flying Frog Consultancy Ltd. > http://www.ffconsultancy.com/?e > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Here is a run of scimark2 with verbose GC enabled. You'll see that there are two garbage collection cycles for a total of around .003 seconds of time. It should also be noted that these GCs happened before the timer starts running. There is almost no dynamic memory allocation in this code. Modern garbage collectors are also very efficient (sometimes better than hand deallocation). java -verbose:gc jnt/scimark2/commandline [GC 511K->202K(1984K), 0.0018845 secs] [GC 714K->415K(1984K), 0.0015513 secs] SciMark 2.0a Composite Score: 327.3062235870194 FFT (1024): 127.42845375506063 SOR (100x100): 677.3128255261597 Monte Carlo : 29.4337095721763 Sparse matmult (N=1000, nz=5000): 300.2107071278524 LU (100x100): 502.14542195384803 java.vendor: Apple Inc. java.version: 1.5.0_16 os.arch: i386 os.name: Mac OS X os.version: 10.5.6 On Jan 31, 2009, at 11:25 PM, Ramón García wrote:> This is not a quite fair comparison. Other virtual machines must be > doing garbage collection, while LLVM, as it is using C code, it is > taking advantage of memory allocation by hand. > > On Fri, Jan 30, 2009 at 9:56 PM, Jon Harrop <jon at ffconsultancy.com> > wrote: >> >> The release of a new code generator in Mono 2.2 prompted me to >> benchmark the >> performance of various VMs using the SciMark2 benchmark on an 8x >> 2.1GHz >> 64-bit Opteron and I have published the results here: >> >> http://flyingfrogblog.blogspot.com/2009/01/mono-22.html >> >> The LLVM results were generated using llvm-gcc 4.2.1 on the C >> version of >> SciMark2 with the following command-line options: >> >> llvm-gcc -Wall -lm -O2 -funroll-loops *.c -o scimark2 >> >> Mono was up to 12x slower than LLVM before and is now only 2.2x >> slower on >> average. Interestingly, the JVM scores slightly higher than LLVM on >> this >> benchmark on average and beats LLVM on two of the five individual >> tests. >> >> The individual scores are particularly enlightening. Specifically: >> >> . LLVM outperforms all other VMs by a significant margin on FFT, >> Monte Carlo >> and sparse matrix multiply. >> >> . LLVM is beaten by the JVM on successive over-relaxation (SOR) and >> LU >> decomposition. >> >> In the context of the SOR test, I suspect the JVM is using alias >> information >> to perform optimizations that LLVM and llvm-gcc probably do not do. >> >> I am not sure what causes the performance discrepancy on LU. >> Perhaps the JVM >> is generating SSE instructions. Does llvm-gcc generate SSE >> instructions under >> any circumstances? >> >> -- >> Dr Jon Harrop, Flying Frog Consultancy Ltd. >> http://www.ffconsultancy.com/?e >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Sunday 01 February 2009 05:25:40 Ramón García wrote:> This is not a quite fair comparison. Other virtual machines must be > doing garbage collection, while LLVM, as it is using C code, it is > taking advantage of memory allocation by hand.That is an insignificant advantage in this particular case (SciMark2) because the memory for each test is preallocated and not part of the measurement and the heap and stack are both tiny during the computations so there is little to traverse. I am interested in the comparative results for LLVM because I consider it to represent how fast my LLVM-based VM might be compared to other garbage collected VMs. However, LLVM has a serious disadvantage compared to the other VMs here because it does not have aliasing assurances. For example, it does not know about array aliasing, e.g. that the subarrays in the successive over-relaxation test cannot overlap. The LLVM 2.1 release notes say that llvm-gcc got alias analysis and understood the "restrict" keyword but when I add it to the C code for SciMark2 it makes no difference. Can anyone else get this to work? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e