Hi all, While doing some benchmarking of Xen, I ran across a couple performance issues. I am wondering if anyone else has noticed this and whether there is anything I can do to tune the performance. The setup: CPU: Athlon XP 2500+ (1826.005 MHz) RAM: Limited to 256 MB in native and xenU Disk:Maxtor 6B200P0, ATA DISK drive Motherboard: ASUS A7VBX-MX SE Network: tested only loopback interface. I have Fedora Core 4 installed as dom0, with Scientific Linux 3.0.7 (RHEL3) installed on a separate partition as the single domU. I installed the FC4 xen rpms (xen-3.0-0.20050912.fc4, kernel-xenU-2.6.12-1.1454_FC4, kernel-xen0-2.6.12-1.1454_FC4) using yum. I used the following benchmark tools/suites: bonnie++-1.03a UnixBench 4.1.0 ab lmbench 3.0-a5 The areas where I saw the greatest performance hit were in system calls, process creation, and pipe throughput. Here are some selected results: UnixBench: =========== Scientific Linux 3 Native: BYTE UNIX Benchmarks (Version 4.1.0) System -- Linux localhost.localdomain 2.4.21-27.0.2.EL #1 Tue Jan 18 20:27:31 CST 2005 i686 athlon i386 GNU/ Linux Start Benchmark Run: Thu Sep 22 15:23:17 PDT 2005 2 interactive users. 15:23:17 up 12 min, 2 users, load average: 0.03, 0.08, 0.05 lrwxr-xr-x 1 root root 4 Sep 9 10:56 /bin/sh -> bash /bin/sh: symbolic link to bash /dev/hdc11 20161172 5059592 14077440 27% / <--snip--> System Call Overhead 995605.1 lps (10.0 secs, 10 samples) Pipe Throughput 1135376.3 lps (10.0 secs, 10 samples) Pipe-based Context Switching 375521.7 lps (10.0 secs, 10 samples) Process Creation 9476.4 lps (30.0 secs, 3 samples) Execl Throughput 2918.3 lps (29.7 secs, 3 samples) <--snip--> INDEX VALUES TEST BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 4307104.5 369.1 Double-Precision Whetstone 55.0 980.4 178.3 Execl Throughput 43.0 2918.3 678.7 File Copy 1024 bufsize 2000 maxblocks 3960.0 143780.0 363.1 File Copy 256 bufsize 500 maxblocks 1655.0 72156.0 436.0 File Copy 4096 bufsize 8000 maxblocks 5800.0 192427.0 331.8 Pipe Throughput 12440.0 1135376.3 912.7 Process Creation 126.0 9476.4 752.1 Shell Scripts (8 concurrent) 6.0 329.7 549.5 System Call Overhead 15000.0 995605.1 663.7 ======== FINAL SCORE 475.2 -------------------------------------------- SL3 XenU BYTE UNIX Benchmarks (Version 4.1.0) System -- Linux localhost.localdomain 2.6.12-1.1454_FC4xenU #1 SMP Fri Sep 9 00:45:34 EDT 2005 i686 athlon i386 GNU/Linux Start Benchmark Run: Fri Sep 23 09:08:23 PDT 2005 1 interactive users. 09:08:23 up 0 min, 1 user, load average: 0.95, 0.25, 0.08 lrwxr-xr-x 1 root root 4 Sep 9 10:56 /bin/sh -> bash /bin/sh: symbolic link to bash /dev/sda1 20161172 5058964 14078068 27% / <--snip--> System Call Overhead 969225.3 lps (10.0 secs, 10 samples) Pipe Throughput 619270.7 lps (10.0 secs, 10 samples) Pipe-based Context Switching 85183.9 lps (10.0 secs, 10 samples) Process Creation 3014.6 lps (30.0 secs, 3 samples) Execl Throughput 1807.4 lps (29.9 secs, 3 samples) <--snip--> INDEX VALUES TEST BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 4288647.9 367.5 Double-Precision Whetstone 55.0 976.3 177.5 Execl Throughput 43.0 1807.4 420.3 File Copy 1024 bufsize 2000 maxblocks 3960.0 143559.0 362.5 File Copy 256 bufsize 500 maxblocks 1655.0 70328.0 424.9 File Copy 4096 bufsize 8000 maxblocks 5800.0 186297.0 321.2 Pipe Throughput 12440.0 619270.7 497.8 Process Creation 126.0 3014.6 239.3 Shell Scripts (8 concurrent) 6.0 188.0 313.3 System Call Overhead 15000.0 969225.3 646.2 ======== FINAL SCORE 356.0 --------------------------------------------------------------------------------- lmbench Selected Results: ========================= SL3 Native: <--snip--> Simple syscall: 0.1516 microseconds Simple read: 0.2147 microseconds Simple write: 0.1817 microseconds Simple stat: 1.8486 microseconds Simple fstat: 0.3026 microseconds Simple open/close: 2.2201 microseconds <--snip--> Protection fault: 0.2196 microseconds Pipe latency: 2.2539 microseconds AF_UNIX sock stream latency: 4.8221 microseconds Process fork+exit: 143.7297 microseconds Process fork+execve: 483.0833 microseconds Process fork+/bin/sh -c: 1884.0000 microseconds ------------------------------------------------- SL3 XenU: <--snip--> Simple syscall: 0.1671 microseconds Simple read: 0.4090 microseconds Simple write: 0.3588 microseconds Simple stat: 3.5761 microseconds Simple fstat: 0.5530 microseconds Simple open/close: 3.9425 microseconds <--snip--> Protection fault: 0.5993 microseconds Pipe latency: 12.1886 microseconds AF_UNIX sock stream latency: 22.3485 microseconds Process fork+exit: 365.8667 microseconds Process fork+execve: 1066.4000 microseconds Process fork+/bin/sh -c: 3826.0000 microseconds <--snip--> ------------------------------------------------------------------------- I can post the full results of these tests if anyone is interested. Does anyone have any ideas for tuning the performance of the domUs? Are there any configurations that perform better than others? Thank You, Angela Norton _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Angela, I''m not sure what you EXPECTED to see. A virtual machine will always be (somewhat) slower than the "real" hardware, because you have an extra software layer for some operations. Basicly, this is the price you pay for the extended system functionality that you get. It''s the same as saying "If I remove the file-system from my Operating System, I can read from or write to the disk much quicker than going through the file-system"[1]. You gain some functionality, and you loose in performance. Comments on Byte-Bench: I can''t explain the pipe throughput, because I just don''t know anything about how that works. Process creation involves a lot of page-table work, which is definitely a typical situation where the hypervisor (Xen) has to take extra action on top of what''s normally done in the OS, as each operations that normally are trivial writes to a page-table entry now has become a call into Xen to perform the "trivial operation". So instead of a few simple operations, we now have a software interrupt, a function call and several extra operations just to find out what needs to be done, then the actual page-table update. I expect this to be an order of magnitude slower than the native operations. My guess is that shell-scripts isn''t slower in themselves, but that there are several new processes created within the shell-script. Comments on lmbench: read & write are slower - no big surprise, it''s most likely that the read & write''s go to a file, which commonly is emulated through the loopback mounted file that is the DomU''s "disk". So you get twice the amount of reads, one in Dom0 reading the disk image, and the data is then transferred to DomU through a "read" operation. Similar for any of the other file-related operations, they become two-step operations, with Dom0 doing the actual work and then transferring the result to DomU. Protection fault handling go through extra steps, as the code enters Xen itself and then has to be passed back to the actual guest that prot-faulted, so it''s expected that those take longer than the same operation in native OS. I still have no explanation for the pipe behaviour - in the few minutes I''ve been working on this answer, I haven''t learnt how pipes work ;-) Sockets, probably related to pipes... But I have no real idea how pipes or sockets work... fork+<something>: More work needed in virtual machine than in the real hardware, as described on process creation above. A factor ~2x slower isn''t bad at all... Some of these operations also involve file operations, which adds to the already slower operation. [1] This assumes the file-system is relatively stupid in caching things, because a modern file-system performs a lot of clever caching/optimisation to increase the system performance. -- Mats ________________________________ From: xen-users-bounces@lists.xensource.com [mailto:xen-users-bounces@lists.xensource.com] On Behalf Of Angela Norton Sent: 11 October 2005 21:51 To: xen-users@lists.xensource.com Subject: [Xen-users] Xen performance Hi all, While doing some benchmarking of Xen, I ran across a couple performance issues. I am wondering if anyone else has noticed this and whether there is anything I can do to tune the performance. The setup: CPU: Athlon XP 2500+ (1826.005 MHz) RAM: Limited to 256 MB in native and xenU Disk:Maxtor 6B200P0, ATA DISK drive Motherboard: ASUS A7VBX-MX SE Network: tested only loopback interface. I have Fedora Core 4 installed as dom0, with Scientific Linux 3.0.7 (RHEL3) installed on a separate partition as the single domU. I installed the FC4 xen rpms (xen-3.0-0.20050912.fc4, kernel-xenU-2.6.12-1.1454_FC4, kernel-xen0-2.6.12-1.1454_FC4) using yum. I used the following benchmark tools/suites: bonnie++-1.03a UnixBench 4.1.0 ab lmbench 3.0-a5 The areas where I saw the greatest performance hit were in system calls, process creation, and pipe throughput. Here are some selected results: UnixBench: =========== Scientific Linux 3 Native: BYTE UNIX Benchmarks (Version 4.1.0) System -- Linux localhost.localdomain 2.4.21-27.0.2.EL #1 Tue Jan 18 20:27:31 CST 2005 i686 athlon i386 GNU/ Linux Start Benchmark Run: Thu Sep 22 15:23:17 PDT 2005 2 interactive users. 15:23:17 up 12 min, 2 users, load average: 0.03, 0.08, 0.05 lrwxr-xr-x 1 root root 4 Sep 9 10:56 /bin/sh -> bash /bin/sh: symbolic link to bash /dev/hdc11 20161172 5059592 14077440 27% / <--snip--> System Call Overhead 995605.1 lps (10.0 secs, 10 samples) Pipe Throughput 1135376.3 lps (10.0 secs, 10 samples) Pipe-based Context Switching 375521.7 lps (10.0 secs, 10 samples) Process Creation 9476.4 lps (30.0 secs, 3 samples) Execl Throughput 2918.3 lps (29.7 secs, 3 samples) <--snip--> INDEX VALUES TEST BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 4307104.5 369.1 Double-Precision Whetstone 55.0 980.4 178.3 Execl Throughput 43.0 2918.3 678.7 File Copy 1024 bufsize 2000 maxblocks 3960.0 143780.0 363.1 File Copy 256 bufsize 500 maxblocks 1655.0 72156.0 436.0 File Copy 4096 bufsize 8000 maxblocks 5800.0 192427.0 331.8 Pipe Throughput 12440.0 1135376.3 912.7 Process Creation 126.0 9476.4 752.1 Shell Scripts (8 concurrent) 6.0 329.7 549.5 System Call Overhead 15000.0 995605.1 663.7 ======== FINAL SCORE 475.2 -------------------------------------------- SL3 XenU BYTE UNIX Benchmarks (Version 4.1.0) System -- Linux localhost.localdomain 2.6.12-1.1454_FC4xenU #1 SMP Fri Sep 9 00:45:34 EDT 2005 i686 athlon i386 GNU/Linux Start Benchmark Run: Fri Sep 23 09:08:23 PDT 2005 1 interactive users. 09:08:23 up 0 min, 1 user, load average: 0.95, 0.25, 0.08 lrwxr-xr-x 1 root root 4 Sep 9 10:56 /bin/sh -> bash /bin/sh: symbolic link to bash /dev/sda1 20161172 5058964 14078068 27% / <--snip--> System Call Overhead 969225.3 lps (10.0 secs, 10 samples) Pipe Throughput 619270.7 lps (10.0 secs, 10 samples) Pipe-based Context Switching 85183.9 lps (10.0 secs, 10 samples) Process Creation 3014.6 lps (30.0 secs, 3 samples) Execl Throughput 1807.4 lps (29.9 secs, 3 samples) <--snip--> INDEX VALUES TEST BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 4288647.9 367.5 Double-Precision Whetstone 55.0 976.3 177.5 Execl Throughput 43.0 1807.4 420.3 File Copy 1024 bufsize 2000 maxblocks 3960.0 143559.0 362.5 File Copy 256 bufsize 500 maxblocks 1655.0 70328.0 424.9 File Copy 4096 bufsize 8000 maxblocks 5800.0 186297.0 321.2 Pipe Throughput 12440.0 619270.7 497.8 Process Creation 126.0 3014.6 239.3 Shell Scripts (8 concurrent) 6.0 188.0 313.3 System Call Overhead 15000.0 969225.3 646.2 ======== FINAL SCORE 356.0 ------------------------------------------------------------------------ --------- lmbench Selected Results: ========================= SL3 Native: <--snip--> Simple syscall: 0.1516 microseconds Simple read: 0.2147 microseconds Simple write: 0.1817 microseconds Simple stat: 1.8486 microseconds Simple fstat: 0.3026 microseconds Simple open/close: 2.2201 microseconds <--snip--> Protection fault: 0.2196 microseconds Pipe latency: 2.2539 microseconds AF_UNIX sock stream latency: 4.8221 microseconds Process fork+exit: 143.7297 microseconds Process fork+execve: 483.0833 microseconds Process fork+/bin/sh -c: 1884.0000 microseconds ------------------------------------------------- SL3 XenU: <--snip--> Simple syscall: 0.1671 microseconds Simple read: 0.4090 microseconds Simple write: 0.3588 microseconds Simple stat: 3.5761 microseconds Simple fstat: 0.5530 microseconds Simple open/close: 3.9425 microseconds <--snip--> Protection fault: 0.5993 microseconds Pipe latency: 12.1886 microseconds AF_UNIX sock stream latency: 22.3485 microseconds Process fork+exit: 365.8667 microseconds Process fork+execve: 1066.4000 microseconds Process fork+/bin/sh -c: 3826.0000 microseconds <--snip--> ------------------------------------------------------------------------ - I can post the full results of these tests if anyone is interested. Does anyone have any ideas for tuning the performance of the domUs? Are there any configurations that perform better than others? Thank You, Angela Norton _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 10/11/05, Angela Norton <anorton@uvic.ca> wrote:> > Hi all, > While doing some benchmarking of Xen, I ran across a couple performance > issues. I am wondering if anyone else has noticed this and whether there is > anything I can do to tune the performance.About your performance: - You should use lvm volumes for your guest systems, that should give better I/O performance. - Disable tls - more IO performance: change FS, tune FS.. about the issues found, can''t comment, but you could probably compare those results to vmware or qemu, to assert if the performance should be better. Like the other reply says, hw is allways better than with some extra operating system layering and doing virtualization. It would be more fair to compare performance between virtualization technologies. Of course hw performance could be used has a baseline. The setup:> CPU: Athlon XP 2500+ (1826.005 MHz) > RAM: Limited to 256 MB in native and xenU > Disk:Maxtor 6B200P0, ATA DISK drive > Motherboard: ASUS A7VBX-MX SE > Network: tested only loopback interface. > > I have Fedora Core 4 installed as dom0, with Scientific Linux 3.0.7(RHEL3) installed on a separate partition as the single domU. I installed > the FC4 xen rpms (xen-3.0-0.20050912.fc4, kernel-xenU-2.6.12-1.1454_FC4, > kernel-xen0-2.6.12-1.1454_FC4) using yum. > > I used the following benchmark tools/suites: > bonnie++-1.03a > UnixBench 4.1.0 > ab > lmbench 3.0-a5 > > The areas where I saw the greatest performance hit were in system calls, > process creation, and pipe throughput. Here are some selected results: > > UnixBench: > ===========> > Scientific Linux 3 Native: > BYTE UNIX Benchmarks (Version 4.1.0) > System -- Linux localhost.localdomain 2.4.21-27.0.2.EL #1 Tue Jan 18 > 20:27:31 CST 2005 i686 athlon i386 GNU/ > Linux > Start Benchmark Run: Thu Sep 22 15:23:17 PDT 2005 > 2 interactive users. > 15:23:17 up 12 min, 2 users, load average: 0.03, 0.08, 0.05 > lrwxr-xr-x 1 root root 4 Sep 9 10:56 /bin/sh -> bash > /bin/sh: symbolic link to bash > /dev/hdc11 20161172 5059592 14077440 27% / > <--snip--> > System Call Overhead 995605.1 lps (10.0 secs, 10 samples) > Pipe Throughput 1135376.3 lps (10.0 secs, 10 samples) > Pipe-based Context Switching 375521.7 lps (10.0 secs, 10 samples) > Process Creation 9476.4 lps (30.0 secs, 3 samples) > Execl Throughput 2918.3 lps (29.7 secs, 3 samples) > <--snip--> > INDEX VALUES > TEST BASELINE RESULT INDEX > > Dhrystone 2 using register variables 116700.0 4307104.5 369.1 > Double-Precision Whetstone 55.0 980.4 178.3 > Execl Throughput 43.0 2918.3 678.7 > File Copy 1024 bufsize 2000 maxblocks 3960.0 143780.0 363.1 > File Copy 256 bufsize 500 maxblocks 1655.0 72156.0 436.0 > File Copy 4096 bufsize 8000 maxblocks 5800.0 192427.0 331.8 > Pipe Throughput 12440.0 1135376.3 912.7 > Process Creation 126.0 9476.4 752.1 > Shell Scripts (8 concurrent) 6.0 329.7 549.5 > System Call Overhead 15000.0 995605.1 663.7 > ========> FINAL SCORE 475.2 > > -------------------------------------------- > > SL3 XenU > BYTE UNIX Benchmarks (Version 4.1.0) > System -- Linux localhost.localdomain 2.6.12-1.1454_FC4xenU #1 SMP Fri Sep > 9 00:45:34 EDT 2005 i686 athlon i386 GNU/Linux > Start Benchmark Run: Fri Sep 23 09:08:23 PDT 2005 > 1 interactive users. > 09:08:23 up 0 min, 1 user, load average: 0.95, 0.25, 0.08 > lrwxr-xr-x 1 root root 4 Sep 9 10:56 /bin/sh -> bash > /bin/sh: symbolic link to bash > /dev/sda1 20161172 5058964 14078068 27% / > <--snip--> > System Call Overhead 969225.3 lps (10.0 secs, 10 samples) > Pipe Throughput 619270.7 lps (10.0 secs, 10 samples) > Pipe-based Context Switching 85183.9 lps (10.0 secs, 10 samples) > Process Creation 3014.6 lps (30.0 secs, 3 samples) > Execl Throughput 1807.4 lps (29.9 secs, 3 samples) > <--snip--> > INDEX VALUES > TEST BASELINE RESULT INDEX > > Dhrystone 2 using register variables 116700.0 4288647.9 367.5 > Double-Precision Whetstone 55.0 976.3 177.5 > Execl Throughput 43.0 1807.4 420.3 > File Copy 1024 bufsize 2000 maxblocks 3960.0 143559.0 362.5 > File Copy 256 bufsize 500 maxblocks 1655.0 70328.0 424.9 > File Copy 4096 bufsize 8000 maxblocks 5800.0 186297.0 321.2 > Pipe Throughput 12440.0 619270.7 497.8 > Process Creation 126.0 3014.6 239.3 > Shell Scripts (8 concurrent) 6.0 188.0 313.3 > System Call Overhead 15000.0 969225.3 646.2 > ========> FINAL SCORE 356.0 > > > --------------------------------------------------------------------------------- > > lmbench Selected Results: > =========================> > SL3 Native: > <--snip--> > Simple syscall: 0.1516 microseconds > Simple read: 0.2147 microseconds > Simple write: 0.1817 microseconds > Simple stat: 1.8486 microseconds > Simple fstat: 0.3026 microseconds > Simple open/close: 2.2201 microseconds > <--snip--> > Protection fault: 0.2196 microseconds > Pipe latency: 2.2539 microseconds > AF_UNIX sock stream latency: 4.8221 microseconds > Process fork+exit: 143.7297 microseconds > Process fork+execve: 483.0833 microseconds > Process fork+/bin/sh -c: 1884.0000 microseconds > > ------------------------------------------------- > > SL3 XenU: > <--snip--> > Simple syscall: 0.1671 microseconds > Simple read: 0.4090 microseconds > Simple write: 0.3588 microseconds > Simple stat: 3.5761 microseconds > Simple fstat: 0.5530 microseconds > Simple open/close: 3.9425 microseconds > <--snip--> > Protection fault: 0.5993 microseconds > Pipe latency: 12.1886 microseconds > AF_UNIX sock stream latency: 22.3485 microseconds > Process fork+exit: 365.8667 microseconds > Process fork+execve: 1066.4000 microseconds > Process fork+/bin/sh -c: 3826.0000 microseconds > <--snip--> > > ------------------------------------------------------------------------- > > > > I can post the full results of these tests if anyone is interested. > > Does anyone have any ideas for tuning the performance of the domUs? Are > there any configurations that perform better than others? > > Thank You, > Angela Norton > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users > > >-- Miguel Sousa Filipe _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users