xuehai zhang
2005-Nov-28 08:21 UTC
[Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
Dear all, When I debugged the execution performance of an application using strace, I found there are some system calls like open and stat64 which run faster on XenLinux than the standard Linux. The following is the output of running "strace -c /bin/sh -c /bin/echo foo" on both systems. An open call runs averagely 109 usec on standard Linux but only 41 usecs on XenLinux. An stat64 call runs 75 usecs on standard Linux but only 19 usecs on XenLinux. The Xen VM runs on the same physical machine as the standard Linux. It uses loopback files in dom0 as the backends of VBDs. Any insight is highly appreciated. Thanks. Xuehai XenLinux: # strace -c /bin/sh -c /bin/echo foo % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 39.05 0.001972 1972 1 write 19.35 0.000977 977 1 execve 7.74 0.000391 24 16 old_mmap 7.23 0.000365 41 9 2 open 6.06 0.000306 51 6 read 5.17 0.000261 10 26 brk 4.93 0.000249 83 3 munmap 1.98 0.000100 14 7 close 1.90 0.000096 9 11 rt_sigaction 1.52 0.000077 11 7 fstat64 1.01 0.000051 9 6 rt_sigprocmask 0.95 0.000048 24 2 access 0.81 0.000041 14 3 uname 0.75 0.000038 19 2 stat64 0.38 0.000019 19 1 ioctl 0.18 0.000009 9 1 time 0.18 0.000009 9 1 getppid 0.16 0.000008 8 1 getpgrp 0.16 0.000008 8 1 getuid32 0.14 0.000007 7 1 getgid32 0.12 0.000006 6 1 getpid 0.12 0.000006 6 1 geteuid32 0.12 0.000006 6 1 getegid32 ------ ----------- ----------- --------- --------- ---------------- 100.00 0.005050 109 2 total Standard Linux: # strace -c /bin/sh -c /bin/echo foo % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 22.90 0.000982 109 9 2 open 22.85 0.000980 980 1 execve 10.87 0.000466 29 16 old_mmap 10.45 0.000448 448 1 write 7.06 0.000303 51 6 read 6.67 0.000286 10 30 brk 3.61 0.000155 78 2 access 3.50 0.000150 75 2 stat64 2.91 0.000125 42 3 munmap 2.24 0.000096 14 7 close 2.12 0.000091 13 7 fstat64 1.84 0.000079 7 11 rt_sigaction 1.03 0.000044 7 6 rt_sigprocmask 0.72 0.000031 10 3 uname 0.19 0.000008 8 1 geteuid32 0.16 0.000007 7 1 time 0.16 0.000007 7 1 getppid 0.16 0.000007 7 1 getpgrp 0.16 0.000007 7 1 getuid32 0.14 0.000006 6 1 getpid 0.14 0.000006 6 1 getgid32 0.12 0.000005 5 1 getegid32 ------ ----------- ----------- --------- --------- ---------------- 100.00 0.004289 112 2 total _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2005-Nov-28 14:36 UTC
RE: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > xuehai zhang > Sent: 28 November 2005 08:21 > To: Xen Mailing List > Subject: [Xen-devel] open/stat64 syscalls run faster on Xen > VM than standard Linux > > Dear all, > When I debugged the execution performance of an application > using strace, I found there are some system calls like open > and stat64 which run faster on XenLinux than the standard > Linux. The following is the output of running "strace -c > /bin/sh -c /bin/echo foo" on both systems. An open call runs > averagely 109 usec on standard Linux but only 41 usecs on > XenLinux. An stat64 call runs > 75 usecs on standard Linux but only 19 usecs on XenLinux. > The Xen VM runs on the same physical machine as the standard > Linux. It uses loopback files in dom0 as the backends of VBDs. > Any insight is highly appreciated.And are you using the same Loopback file-system when running native Linux? If not, it''s probably the main cause of the difference. -- Mats> Thanks. > Xuehai >[snip of tables] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2005-Nov-28 14:38 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
This may just be the difference between having the extra level of block caching from using a loop back device. Try running the same benchmark on a domain that uses an actual partition. While the syscalls may appear to be faster, I imagine it''s because the cost of pulling in a block has already been payed so the overall workload is unaffected. Regards, Anthony Liguori xuehai zhang wrote:> Dear all, > When I debugged the execution performance of an application using > strace, I found there are some system calls like open and stat64 which > run faster on XenLinux than the standard Linux. The following is the > output of running "strace -c /bin/sh -c /bin/echo foo" on both > systems. An open call runs averagely 109 usec on standard Linux but > only 41 usecs on XenLinux. An stat64 call runs > 75 usecs on standard Linux but only 19 usecs on XenLinux. > The Xen VM runs on the same physical machine as the standard Linux. It > uses loopback files in dom0 as the backends of VBDs. > Any insight is highly appreciated. > Thanks. > Xuehai > > XenLinux: > # strace -c /bin/sh -c /bin/echo foo > > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 39.05 0.001972 1972 1 write > 19.35 0.000977 977 1 execve > 7.74 0.000391 24 16 old_mmap > 7.23 0.000365 41 9 2 open > 6.06 0.000306 51 6 read > 5.17 0.000261 10 26 brk > 4.93 0.000249 83 3 munmap > 1.98 0.000100 14 7 close > 1.90 0.000096 9 11 rt_sigaction > 1.52 0.000077 11 7 fstat64 > 1.01 0.000051 9 6 rt_sigprocmask > 0.95 0.000048 24 2 access > 0.81 0.000041 14 3 uname > 0.75 0.000038 19 2 stat64 > 0.38 0.000019 19 1 ioctl > 0.18 0.000009 9 1 time > 0.18 0.000009 9 1 getppid > 0.16 0.000008 8 1 getpgrp > 0.16 0.000008 8 1 getuid32 > 0.14 0.000007 7 1 getgid32 > 0.12 0.000006 6 1 getpid > 0.12 0.000006 6 1 geteuid32 > 0.12 0.000006 6 1 getegid32 > ------ ----------- ----------- --------- --------- ---------------- > 100.00 0.005050 109 2 total > > > Standard Linux: > # strace -c /bin/sh -c /bin/echo foo > > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 22.90 0.000982 109 9 2 open > 22.85 0.000980 980 1 execve > 10.87 0.000466 29 16 old_mmap > 10.45 0.000448 448 1 write > 7.06 0.000303 51 6 read > 6.67 0.000286 10 30 brk > 3.61 0.000155 78 2 access > 3.50 0.000150 75 2 stat64 > 2.91 0.000125 42 3 munmap > 2.24 0.000096 14 7 close > 2.12 0.000091 13 7 fstat64 > 1.84 0.000079 7 11 rt_sigaction > 1.03 0.000044 7 6 rt_sigprocmask > 0.72 0.000031 10 3 uname > 0.19 0.000008 8 1 geteuid32 > 0.16 0.000007 7 1 time > 0.16 0.000007 7 1 getppid > 0.16 0.000007 7 1 getpgrp > 0.16 0.000007 7 1 getuid32 > 0.14 0.000006 6 1 getpid > 0.14 0.000006 6 1 getgid32 > 0.12 0.000005 5 1 getegid32 > ------ ----------- ----------- --------- --------- ---------------- > 100.00 0.004289 112 2 total > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2005-Nov-28 14:45 UTC
RE: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > Anthony Liguori > Sent: 28 November 2005 14:39 > To: xuehai zhang > Cc: Xen Mailing List > Subject: Re: [Xen-devel] open/stat64 syscalls run faster on > Xen VM than standard Linux > > This may just be the difference between having the extra > level of block caching from using a loop back device. > > Try running the same benchmark on a domain that uses an > actual partition. While the syscalls may appear to be > faster, I imagine it''s because the cost of pulling in a block > has already been payed so the overall workload is unaffected.And this would be the same as running standard linux with the loopback file-system mounted and chroot to the local file-system, or would that be different? [I''m asking because I don''t actually understand enough about how it works to know what difference it makes, and I would like to know, because at some point I''ll probably need to know this.] -- Mats> > Regards, > > Anthony Liguori > > xuehai zhang wrote:[snip] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-28 14:47 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
Petersson, Mats wrote:> > > >>-----Original Message----- >>From: xen-devel-bounces@lists.xensource.com >>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of >>xuehai zhang >>Sent: 28 November 2005 08:21 >>To: Xen Mailing List >>Subject: [Xen-devel] open/stat64 syscalls run faster on Xen >>VM than standard Linux >> >>Dear all, >>When I debugged the execution performance of an application >>using strace, I found there are some system calls like open >>and stat64 which run faster on XenLinux than the standard >>Linux. The following is the output of running "strace -c >>/bin/sh -c /bin/echo foo" on both systems. An open call runs >>averagely 109 usec on standard Linux but only 41 usecs on >>XenLinux. An stat64 call runs >>75 usecs on standard Linux but only 19 usecs on XenLinux. >>The Xen VM runs on the same physical machine as the standard >>Linux. It uses loopback files in dom0 as the backends of VBDs. >>Any insight is highly appreciated. > > > And are you using the same Loopback file-system when running native > Linux? If not, it''s probably the main cause of the difference.No, the native Linux runs on a regular linux machine and use the raw hard disk partitions. Xuehai _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-28 14:59 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
Anthony, Thanks a lot for the reply.> This may just be the difference between having the extra level of block > caching from using a loop back device.How can I know exactly how the extra level of block caching is handled in the loop back device? Is there anyway to disable the extra block caching when using a loop back device?> Try running the same benchmark on a domain that uses an actual > partition. While the syscalls may appear to be faster, I imagine it''s > because the cost of pulling in a block has already been payed so the > overall workload is unaffected.I will try to test with a domain using actual partitions and report the results. Thanks again. Xuehai> xuehai zhang wrote: > >> Dear all, >> When I debugged the execution performance of an application using >> strace, I found there are some system calls like open and stat64 which >> run faster on XenLinux than the standard Linux. The following is the >> output of running "strace -c /bin/sh -c /bin/echo foo" on both >> systems. An open call runs averagely 109 usec on standard Linux but >> only 41 usecs on XenLinux. An stat64 call runs >> 75 usecs on standard Linux but only 19 usecs on XenLinux. >> The Xen VM runs on the same physical machine as the standard Linux. It >> uses loopback files in dom0 as the backends of VBDs. >> Any insight is highly appreciated. >> Thanks. >> Xuehai >> >> XenLinux: >> # strace -c /bin/sh -c /bin/echo foo >> >> % time seconds usecs/call calls errors syscall >> ------ ----------- ----------- --------- --------- ---------------- >> 39.05 0.001972 1972 1 write >> 19.35 0.000977 977 1 execve >> 7.74 0.000391 24 16 old_mmap >> 7.23 0.000365 41 9 2 open >> 6.06 0.000306 51 6 read >> 5.17 0.000261 10 26 brk >> 4.93 0.000249 83 3 munmap >> 1.98 0.000100 14 7 close >> 1.90 0.000096 9 11 rt_sigaction >> 1.52 0.000077 11 7 fstat64 >> 1.01 0.000051 9 6 rt_sigprocmask >> 0.95 0.000048 24 2 access >> 0.81 0.000041 14 3 uname >> 0.75 0.000038 19 2 stat64 >> 0.38 0.000019 19 1 ioctl >> 0.18 0.000009 9 1 time >> 0.18 0.000009 9 1 getppid >> 0.16 0.000008 8 1 getpgrp >> 0.16 0.000008 8 1 getuid32 >> 0.14 0.000007 7 1 getgid32 >> 0.12 0.000006 6 1 getpid >> 0.12 0.000006 6 1 geteuid32 >> 0.12 0.000006 6 1 getegid32 >> ------ ----------- ----------- --------- --------- ---------------- >> 100.00 0.005050 109 2 total >> >> >> Standard Linux: >> # strace -c /bin/sh -c /bin/echo foo >> >> % time seconds usecs/call calls errors syscall >> ------ ----------- ----------- --------- --------- ---------------- >> 22.90 0.000982 109 9 2 open >> 22.85 0.000980 980 1 execve >> 10.87 0.000466 29 16 old_mmap >> 10.45 0.000448 448 1 write >> 7.06 0.000303 51 6 read >> 6.67 0.000286 10 30 brk >> 3.61 0.000155 78 2 access >> 3.50 0.000150 75 2 stat64 >> 2.91 0.000125 42 3 munmap >> 2.24 0.000096 14 7 close >> 2.12 0.000091 13 7 fstat64 >> 1.84 0.000079 7 11 rt_sigaction >> 1.03 0.000044 7 6 rt_sigprocmask >> 0.72 0.000031 10 3 uname >> 0.19 0.000008 8 1 geteuid32 >> 0.16 0.000007 7 1 time >> 0.16 0.000007 7 1 getppid >> 0.16 0.000007 7 1 getpgrp >> 0.16 0.000007 7 1 getuid32 >> 0.14 0.000006 6 1 getpid >> 0.14 0.000006 6 1 getgid32 >> 0.12 0.000005 5 1 getegid32 >> ------ ----------- ----------- --------- --------- ---------------- >> 100.00 0.004289 112 2 total >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-28 15:28 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
Petersson, Mats wrote:>>-----Original Message----- >>From: xen-devel-bounces@lists.xensource.com >>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of >>Anthony Liguori >>Sent: 28 November 2005 14:39 >>To: xuehai zhang >>Cc: Xen Mailing List >>Subject: Re: [Xen-devel] open/stat64 syscalls run faster on >>Xen VM than standard Linux >> >>This may just be the difference between having the extra >>level of block caching from using a loop back device. >> >>Try running the same benchmark on a domain that uses an >>actual partition. While the syscalls may appear to be >>faster, I imagine it''s because the cost of pulling in a block >>has already been payed so the overall workload is unaffected. > > > And this would be the same as running standard linux with the loopback > file-system mounted and chroot to the local file-system, or would that > be different? [I''m asking because I don''t actually understand enough > about how it works to know what difference it makes, and I would like to > know, because at some point I''ll probably need to know this.]Mats, I will do this experiments and see what is happening (as I promised in the popen threads). Thanks. Xuehai _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-28 15:50 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
Mats, I mounted the loopback file in dom0, chrooted to the mountpoint and redid the experiment. The results is attached below. The time of open and stat64 calls is similar to the XenLinux case and also much smaller than the standard Linux case. So, either using loopback file as backend of XenLinux or directly mounting it in local filesystem will result in some benefit (maybe just caused by the extra layer of block caching) for the performance of some system calls. # strace -c /bin/sh -c /bin/echo foo % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 21.93 0.000490 490 1 write 16.34 0.000365 24 15 old_mmap 15.26 0.000341 38 9 3 open 9.62 0.000215 43 5 read 7.97 0.000178 10 18 brk 7.79 0.000174 87 2 munmap 4.07 0.000091 8 11 rt_sigaction 3.27 0.000073 12 6 close 2.91 0.000065 11 6 fstat64 2.28 0.000051 9 6 rt_sigprocmask 2.15 0.000048 24 2 access 1.75 0.000039 13 3 uname 1.66 0.000037 19 2 stat64 0.40 0.000009 9 1 getpgrp 0.40 0.000009 9 1 getuid32 0.36 0.000008 8 1 time 0.36 0.000008 8 1 getppid 0.36 0.000008 8 1 getgid32 0.31 0.000007 7 1 getpid 0.27 0.000006 6 1 execve 0.27 0.000006 6 1 geteuid32 0.27 0.000006 6 1 getegid32 ------ ----------- ----------- --------- --------- ---------------- 100.00 0.002234 95 3 total Thanks. Xuehai Petersson, Mats wrote:>>-----Original Message----- >>From: xen-devel-bounces@lists.xensource.com >>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of >>Anthony Liguori >>Sent: 28 November 2005 14:39 >>To: xuehai zhang >>Cc: Xen Mailing List >>Subject: Re: [Xen-devel] open/stat64 syscalls run faster on >>Xen VM than standard Linux >> >>This may just be the difference between having the extra >>level of block caching from using a loop back device. >> >>Try running the same benchmark on a domain that uses an >>actual partition. While the syscalls may appear to be >>faster, I imagine it''s because the cost of pulling in a block >>has already been payed so the overall workload is unaffected. > > > And this would be the same as running standard linux with the loopback > file-system mounted and chroot to the local file-system, or would that > be different? [I''m asking because I don''t actually understand enough > about how it works to know what difference it makes, and I would like to > know, because at some point I''ll probably need to know this.] > > -- > Mats > >>Regards, >> >>Anthony Liguori >> >>xuehai zhang wrote: > > [snip] > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephen C. Tweedie
2005-Nov-28 16:07 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
Hi, On Mon, 2005-11-28 at 09:50 -0600, xuehai zhang wrote:> I mounted the loopback file in dom0, chrooted to the mountpoint and redid the experiment. The > results is attached below. The time of open and stat64 calls is similar to the XenLinux case and > also much smaller than the standard Linux case. So, either using loopback file as backend of > XenLinux or directly mounting it in local filesystem will result in some benefit (maybe just caused > by the extra layer of block caching) for the performance of some system calls.What time, exactly, is being measured here? Is it wall-clock gettimeofday? Or only elapsed time inside the domU? It would be enlightening to re-measure the performance by counting the number of times these operations can be completed in a second, just to get a second opinion and to determine if the difference is genuine or if it''s just an artifact of how we''re measuring time when multiple domains are involved. --Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2005-Nov-28 16:15 UTC
RE: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
> -----Original Message----- > From: xuehai zhang [mailto:hai@cs.uchicago.edu] > Sent: 28 November 2005 15:51 > To: Petersson, Mats > Cc: Anthony Liguori; Xen Mailing List; Kate Keahey; Tim Freeman > Subject: Re: [Xen-devel] open/stat64 syscalls run faster on > Xen VM than standard Linux > > Mats, > > I mounted the loopback file in dom0, chrooted to the > mountpoint and redid the experiment. The results is attached > below. The time of open and stat64 calls is similar to the > XenLinux case and also much smaller than the standard Linux > case. So, either using loopback file as backend of XenLinux > or directly mounting it in local filesystem will result in > some benefit (maybe just caused by the extra layer of block > caching) for the performance of some system calls.Yes, I think the caching of the blocks in two layers will be the reason you get this effect. The loopback file is cached once in the fs handling the REAL HARD DISK, and then other blocks would be cached in the fs handling the loopback. In this case the directory of the file(s) involved in your benchmark are probably held entirely in memory, whilst when you use a real disk to do the same thing, you could end up with some "real" accesses to the disk device itself. Next question will probably be why write is slower in Xen+Linux than native Linux - something I can''t say for sure, but I would expect it to be because the write is going through Xen in the Xen+Linux case and straight through Linux when in the native linux case. But that''s just a guess. [And since it''s slower in Xen, I don''t expect you to be surprised by this]. And the write call is almost identical to the Linux native, as you''d expect. -- Mats> > # strace -c /bin/sh -c /bin/echo foo > > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 21.93 0.000490 490 1 write > 16.34 0.000365 24 15 old_mmap > 15.26 0.000341 38 9 3 open > 9.62 0.000215 43 5 read > 7.97 0.000178 10 18 brk > 7.79 0.000174 87 2 munmap > 4.07 0.000091 8 11 rt_sigaction > 3.27 0.000073 12 6 close > 2.91 0.000065 11 6 fstat64 > 2.28 0.000051 9 6 rt_sigprocmask > 2.15 0.000048 24 2 access > 1.75 0.000039 13 3 uname > 1.66 0.000037 19 2 stat64 > 0.40 0.000009 9 1 getpgrp > 0.40 0.000009 9 1 getuid32 > 0.36 0.000008 8 1 time > 0.36 0.000008 8 1 getppid > 0.36 0.000008 8 1 getgid32 > 0.31 0.000007 7 1 getpid > 0.27 0.000006 6 1 execve > 0.27 0.000006 6 1 geteuid32 > 0.27 0.000006 6 1 getegid32 > ------ ----------- ----------- --------- --------- ---------------- > 100.00 0.002234 95 3 total > > Thanks. > > Xuehai > > > Petersson, Mats wrote: > >>-----Original Message----- > >>From: xen-devel-bounces@lists.xensource.com > >>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Anthony > >>Liguori > >>Sent: 28 November 2005 14:39 > >>To: xuehai zhang > >>Cc: Xen Mailing List > >>Subject: Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM > >>than standard Linux > >> > >>This may just be the difference between having the extra level of > >>block caching from using a loop back device. > >> > >>Try running the same benchmark on a domain that uses an actual > >>partition. While the syscalls may appear to be faster, I > imagine it''s > >>because the cost of pulling in a block has already been > payed so the > >>overall workload is unaffected. > > > > > > And this would be the same as running standard linux with > the loopback > > file-system mounted and chroot to the local file-system, or > would that > > be different? [I''m asking because I don''t actually > understand enough > > about how it works to know what difference it makes, and I > would like > > to know, because at some point I''ll probably need to know this.] > > > > -- > > Mats > > > >>Regards, > >> > >>Anthony Liguori > >> > >>xuehai zhang wrote: > > > > [snip] > > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-28 16:27 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
Stephen, Thanks for the reply.>>I mounted the loopback file in dom0, chrooted to the mountpoint and redid the experiment. The >>results is attached below. The time of open and stat64 calls is similar to the XenLinux case and >>also much smaller than the standard Linux case. So, either using loopback file as backend of >>XenLinux or directly mounting it in local filesystem will result in some benefit (maybe just caused >>by the extra layer of block caching) for the performance of some system calls. > > > What time, exactly, is being measured here? Is it wall-clock > gettimeofday? Or only elapsed time inside the domU?The time is collected and reported by strace by using the gettimeofday function.> It would be > enlightening to re-measure the performance by counting the number of > times these operations can be completed in a second, just to get a > second opinion and to determine if the difference is genuine or if it''s > just an artifact of how we''re measuring time when multiple domains are > involved.Your questioning makes sense to me. But I am not very sure about how to effectively count how many these operations can be completed in a second. Can you give me some hint? Thanks again. Xuehai _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephen C. Tweedie
2005-Nov-28 17:10 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
Hi, On Mon, 2005-11-28 at 10:27 -0600, xuehai zhang wrote:> Your questioning makes sense to me. But I am not very sure about how to effectively count how many > these operations can be completed in a second. Can you give me some hint?Here''s a quick-and-dirty wrapper for timing "something" over the space of a few seconds (default 10) and working out how fast it went. Obviously, you want to be running this on an otherwise-idle machine, and with CPU frequency management disabled. It''s really, really dumb, but it only uses "time()", not any subsecond time sourcing, for its work. Cheers, Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-28 17:16 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
Petersson, Mats wrote:>>-----Original Message----- >>From: xuehai zhang [mailto:hai@cs.uchicago.edu] >>Sent: 28 November 2005 15:51 >>To: Petersson, Mats >>Cc: Anthony Liguori; Xen Mailing List; Kate Keahey; Tim Freeman >>Subject: Re: [Xen-devel] open/stat64 syscalls run faster on >>Xen VM than standard Linux >> >>Mats, >> >>I mounted the loopback file in dom0, chrooted to the >>mountpoint and redid the experiment. The results is attached >>below. The time of open and stat64 calls is similar to the >>XenLinux case and also much smaller than the standard Linux >>case. So, either using loopback file as backend of XenLinux >>or directly mounting it in local filesystem will result in >>some benefit (maybe just caused by the extra layer of block >>caching) for the performance of some system calls. > > > Yes, I think the caching of the blocks in two layers will be the reason > you get this effect. The loopback file is cached once in the fs handling > the REAL HARD DISK, and then other blocks would be cached in the fs > handling the loopback.Is "the fs handling the REAL HD" the dom0''s filesystem? Is the cache used here is the dom0''s disk buffer cache or something else? What is "the fs handling the loopback"? Is the filesystem seen inside of the XenLinux or still the filesystem of dom0? What is the cache used in this case?> In this case the directory of the file(s) > involved in your benchmark are probably held entirely in memory, whilst > when you use a real disk to do the same thing, you could end up with > some "real" accesses to the disk device itself.To confirm our hypothesis that two layer block caching is the real cause, what experiments I can do to show exactly a block is accessed from a cache instead of hard disk on XenLinux but it has to be read from hard disk on stand Linux? Maybe I can use "vmstat" in dom0 to track block receive/send during the execution of the benchmark.> Next question will probably be why write is slower in Xen+Linux than > native Linux - something I can''t say for sure, but I would expect it to > be because the write is going through Xen in the Xen+Linux case and > straight through Linux when in the native linux case. But that''s just a > guess. [And since it''s slower in Xen, I don''t expect you to be surprised > by this]. And the write call is almost identical to the Linux native, as > you''d expect.I also agree the overhead of write system call in VM is caused by Xen. I actually run a "dd" benchmark to create a disk file from /dev/zero on both machines and the VM is slower than the physical machine as we expect. So, the benchmark experiments I''ve done so far suggests XenLinux using loopback files as VBD backends shows better performance (faster execution) on part of the system calls like open and stat64, but it shows worse performance (slower execution) on other system calls like write than the standard Linux. Does this mean different applications may have different execution behaviors on VM than on the standard Linux? In other words, some applications run faster on VM and some slower, comparing with the physical machine? Thanks. Xuehai>># strace -c /bin/sh -c /bin/echo foo >> >>% time seconds usecs/call calls errors syscall >>------ ----------- ----------- --------- --------- ---------------- >> 21.93 0.000490 490 1 write >> 16.34 0.000365 24 15 old_mmap >> 15.26 0.000341 38 9 3 open >> 9.62 0.000215 43 5 read >> 7.97 0.000178 10 18 brk >> 7.79 0.000174 87 2 munmap >> 4.07 0.000091 8 11 rt_sigaction >> 3.27 0.000073 12 6 close >> 2.91 0.000065 11 6 fstat64 >> 2.28 0.000051 9 6 rt_sigprocmask >> 2.15 0.000048 24 2 access >> 1.75 0.000039 13 3 uname >> 1.66 0.000037 19 2 stat64 >> 0.40 0.000009 9 1 getpgrp >> 0.40 0.000009 9 1 getuid32 >> 0.36 0.000008 8 1 time >> 0.36 0.000008 8 1 getppid >> 0.36 0.000008 8 1 getgid32 >> 0.31 0.000007 7 1 getpid >> 0.27 0.000006 6 1 execve >> 0.27 0.000006 6 1 geteuid32 >> 0.27 0.000006 6 1 getegid32 >>------ ----------- ----------- --------- --------- ---------------- >>100.00 0.002234 95 3 total >> >>Thanks. >> >>Xuehai >> >> >>Petersson, Mats wrote: >> >>>>-----Original Message----- >>>>From: xen-devel-bounces@lists.xensource.com >>>>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Anthony >>>>Liguori >>>>Sent: 28 November 2005 14:39 >>>>To: xuehai zhang >>>>Cc: Xen Mailing List >>>>Subject: Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM >>>>than standard Linux >>>> >>>>This may just be the difference between having the extra level of >>>>block caching from using a loop back device. >>>> >>>>Try running the same benchmark on a domain that uses an actual >>>>partition. While the syscalls may appear to be faster, I >> >>imagine it''s >> >>>>because the cost of pulling in a block has already been >> >>payed so the >> >>>>overall workload is unaffected. >>> >>> >>>And this would be the same as running standard linux with >> >>the loopback >> >>>file-system mounted and chroot to the local file-system, or >> >>would that >> >>>be different? [I''m asking because I don''t actually >> >>understand enough >> >>>about how it works to know what difference it makes, and I >> >>would like >> >>>to know, because at some point I''ll probably need to know this.] >>> >>>-- >>>Mats >>> >>> >>>>Regards, >>>> >>>>Anthony Liguori >>>> >>>>xuehai zhang wrote: >>> >>>[snip] >>> >>> >> >> >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2005-Nov-28 18:17 UTC
RE: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
> -----Original Message----- > From: xuehai zhang [mailto:hai@cs.uchicago.edu] > Sent: 28 November 2005 17:17 > To: Petersson, Mats > Cc: Anthony Liguori; Xen Mailing List; Kate Keahey; Tim Freeman > Subject: Re: [Xen-devel] open/stat64 syscalls run faster on > Xen VM than standard Linux > > Petersson, Mats wrote: > >>-----Original Message----- > >>From: xuehai zhang [mailto:hai@cs.uchicago.edu] > >>Sent: 28 November 2005 15:51 > >>To: Petersson, Mats > >>Cc: Anthony Liguori; Xen Mailing List; Kate Keahey; Tim Freeman > >>Subject: Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM > >>than standard Linux > >> > >>Mats, > >> > >>I mounted the loopback file in dom0, chrooted to the mountpoint and > >>redid the experiment. The results is attached below. The > time of open > >>and stat64 calls is similar to the XenLinux case and also > much smaller > >>than the standard Linux case. So, either using loopback file as > >>backend of XenLinux or directly mounting it in local > filesystem will > >>result in some benefit (maybe just caused by the extra > layer of block > >>caching) for the performance of some system calls. > > > > > > Yes, I think the caching of the blocks in two layers will be the > > reason you get this effect. The loopback file is cached > once in the fs > > handling the REAL HARD DISK, and then other blocks would be > cached in > > the fs handling the loopback. > > Is "the fs handling the REAL HD" the dom0''s filesystem? Is > the cache used here is the dom0''s disk buffer cache or > something else? What is "the fs handling the loopback"? Is > the filesystem seen inside of the XenLinux or still the > filesystem of dom0? What is the cache used in this case?In both native linux and XenLinux, it''s the file-system that handles the actual disk where the VBD-file lives. The software that manages the file-system, aka file-system driver, will have a block-cache. Since you''re then mounting a VBD through a different (maybe the same type) file-system driver, that handles this "disk", it will have it''s additional set of block caching. Of course, if a block is cached in the upper level (VBD) file-system driver, the lower-level file-system driver doesn''t get the call, and thus no action needs to be taken there. It also means that any access to a DIFFERENT block in the lower level driver can be cached, because there''s now more space in that cache, since the first cached access didn''t even touch the second driver. It''s similar to doubling the block-cache. [The exact effects of doubling the block-cache would depend on it''s internal architecture, and I know NOTHING about this, so I can''t say if it would have the same effect or different effect].> > > In this case the directory of the file(s) involved in your > benchmark > > are probably held entirely in memory, whilst when you use a > real disk > > to do the same thing, you could end up with some "real" accesses to > > the disk device itself. > > To confirm our hypothesis that two layer block caching is the > real cause, what experiments I can do to show exactly a block > is accessed from a cache instead of hard disk on XenLinux but > it has to be read from hard disk on stand Linux? Maybe I can > use "vmstat" in dom0 to track block receive/send during the > execution of the benchmark.I think we''ve just proven that it''s the VBD that causes the speed difference. Within a few percent, your last test showed the same numbers as the Xen-Linux combination - writes being slower on Xen-linux, which is what I''d expect, but otherwise the system calls are pretty much the same. I''d expect execve to be much slower on Xen+Linux, as it''s messing about with loading an application and doing page-table manipulations. We get similar times for old_mmap, open is about the same, read is reasonably the same (slower on XenLinux, also expected, I think), similar for brk, munmap, etc, etc. So the main differences between XenLinux and native Linux is that you''re running one with two levels of block-caching, and if you don''t use the VBD/Loop-back-mounted file as your "disk", you only get one level of caching and thus slower access.> > > Next question will probably be why write is slower in > Xen+Linux than > > native Linux - something I can''t say for sure, but I would > expect it > > to be because the write is going through Xen in the > Xen+Linux case and > > straight through Linux when in the native linux case. But > that''s just > > a guess. [And since it''s slower in Xen, I don''t expect you to be > > surprised by this]. And the write call is almost identical to the > > Linux native, as you''d expect. > > I also agree the overhead of write system call in VM is > caused by Xen. I actually run a "dd" > benchmark to create a disk file from /dev/zero on both > machines and the VM is slower than the physical machine as we expect. > > So, the benchmark experiments I''ve done so far suggests > XenLinux using loopback files as VBD backends shows better > performance (faster execution) on part of the system calls > like open and stat64, but it shows worse performance (slower > execution) on other system calls like write than the standard > Linux. Does this mean different applications may have > different execution behaviors on VM than on the standard > Linux? In other words, some applications run faster on VM and > some slower, comparing with the physical machine?Yes, but for larger (i.e., more realistic) workloads it would probably even out, since the advantage of any cache is only really useful if you get lots of hits in that cache, and the smaller cache isn''t big enough already. If you have a cache of, say, 2MB, then the difference between 2MB and 4MB when you access 3.5MB of data would be very noticable [numbers just pulled from thin air, I have no idea of how much data a testcase such as yours will access]. If, on the other hand, you run a full-linux kernel compile that accesses several hundred megabytes of source-code, include files and generating several megabyes of object files, symbol files and end-product binary, it would be much less noticable whether the cache is 2MB or 4MB or even 32MB. And of course, this would perform a lot of read/write operations that would very likely eat up any of the previosly gained performance benefit. However, performing REAL tasks would be a much better benchmark on the Xen performance than to microbenchmark some special system call such as popen, since the REAL task would actually be something that a Xen-Linux user would perhaps do in their daily work. [Although compiling the kernel is fairly unlikely, I can certainly see people doing code-development under Xen-Linux]. Running a large database under Xen-Linux would be another likely task, or a web-server. Some of this would obviously take longer, others may run faster. I have no idea of which it is. My point is rather that you''re analyzing ONE C-library function, and drawing big conclusions from it. Running a wider spread of loads on the system would give more accurate average results. Averages are good for describing something where you have many different samples, but as my father once said "If you stand with one foot in ice, and the other in hot water, on average, your feet are comfortable". So don''t JUST look at the average, look at the max/min values too to determine how things stand. Yes, I do realize that testing the system (real world) performance in such tests as Compiling the Linux kernel, or running an SQL database or web-server are both more complicated, involve less (or no) code-writing - so less fun, and will not test PARTICULAR points of the kernel, and will spend a lot of time in user-mode running exactly identical code on both systems. However, I think it''s a better way to get a BIG picture understanding of differences between Xen and native Linux. Btw, I would also suggest that it would make sense to test on a MODERN system - a Pentium III must be at least 5 years old by now, and if the rest of the components are equal age, then I suspect that the performance on a modern machine is at least 2-3x better. -- Mats> > Thanks. > > Xuehai > > >># strace -c /bin/sh -c /bin/echo foo > >> > >>% time seconds usecs/call calls errors syscall > >>------ ----------- ----------- --------- --------- ---------------- > >> 21.93 0.000490 490 1 write > >> 16.34 0.000365 24 15 old_mmap > >> 15.26 0.000341 38 9 3 open > >> 9.62 0.000215 43 5 read > >> 7.97 0.000178 10 18 brk > >> 7.79 0.000174 87 2 munmap > >> 4.07 0.000091 8 11 rt_sigaction > >> 3.27 0.000073 12 6 close > >> 2.91 0.000065 11 6 fstat64 > >> 2.28 0.000051 9 6 rt_sigprocmask > >> 2.15 0.000048 24 2 access > >> 1.75 0.000039 13 3 uname > >> 1.66 0.000037 19 2 stat64 > >> 0.40 0.000009 9 1 getpgrp > >> 0.40 0.000009 9 1 getuid32 > >> 0.36 0.000008 8 1 time > >> 0.36 0.000008 8 1 getppid > >> 0.36 0.000008 8 1 getgid32 > >> 0.31 0.000007 7 1 getpid > >> 0.27 0.000006 6 1 execve > >> 0.27 0.000006 6 1 geteuid32 > >> 0.27 0.000006 6 1 getegid32 > >>------ ----------- ----------- --------- --------- ---------------- > >>100.00 0.002234 95 3 total > >> > >>Thanks. > >> > >>Xuehai > >> > >> > >>Petersson, Mats wrote: > >> > >>>>-----Original Message----- > >>>>From: xen-devel-bounces@lists.xensource.com > >>>>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf > Of Anthony > >>>>Liguori > >>>>Sent: 28 November 2005 14:39 > >>>>To: xuehai zhang > >>>>Cc: Xen Mailing List > >>>>Subject: Re: [Xen-devel] open/stat64 syscalls run faster > on Xen VM > >>>>than standard Linux > >>>> > >>>>This may just be the difference between having the extra level of > >>>>block caching from using a loop back device. > >>>> > >>>>Try running the same benchmark on a domain that uses an actual > >>>>partition. While the syscalls may appear to be faster, I > >> > >>imagine it''s > >> > >>>>because the cost of pulling in a block has already been > >> > >>payed so the > >> > >>>>overall workload is unaffected. > >>> > >>> > >>>And this would be the same as running standard linux with > >> > >>the loopback > >> > >>>file-system mounted and chroot to the local file-system, or > >> > >>would that > >> > >>>be different? [I''m asking because I don''t actually > >> > >>understand enough > >> > >>>about how it works to know what difference it makes, and I > >> > >>would like > >> > >>>to know, because at some point I''ll probably need to know this.] > >>> > >>>-- > >>>Mats > >>> > >>> > >>>>Regards, > >>>> > >>>>Anthony Liguori > >>>> > >>>>xuehai zhang wrote: > >>> > >>>[snip] > >>> > >>> > >> > >> > >> > > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2005-Nov-28 19:37 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
xuehai zhang wrote:> So, the benchmark experiments I''ve done so far suggests XenLinux using > loopback files as VBD backends shows better performance (faster > execution) on part of the system calls like open and stat64, but it > shows worse performance (slower execution) on other system calls like > write than the standard Linux. Does this mean different applications > may have different execution behaviors on VM than on the standard > Linux? In other words, some applications run faster on VM and some > slower, comparing with the physical machine?I really want to stress here that your results do not mean system calls are faster in Xen (or with a loopback device). You''re seeing the effects of caching which are probably sub-optimal for overall performance. A more meaningful benchmark here is to take a syscall intensive workload, and measure the thoroughput. You''ll likely find that the syscall performance is actually worse because when you start missing the cache you''re going to get huge delays in read-ahead. Be wary of microbenchmarks, they often are way off from real-world scenarios. As for write performance, in general, read and write performance are going to be measurably slower in Xen than on bare-metal Linux. It almost has to be by definition. The hope is that the difference in performance will be neglible especially considering the improved thoroughput from greater utiliziation of the underlying hardware. Regards, Anthony Liguori _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Freeman
2005-Nov-29 04:50 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
On Mon, 28 Nov 2005 12:10:46 -0500 "Stephen C. Tweedie" <sct@redhat.com> wrote:> Hi, > > On Mon, 2005-11-28 at 10:27 -0600, xuehai zhang wrote: > > > Your questioning makes sense to me. But I am not very sure about how to > > effectively count how many these operations can be completed in a second. > > Can you give me some hint? > > Here''s a quick-and-dirty wrapper for timing "something" over the space > of a few seconds (default 10) and working out how fast it went. > Obviously, you want to be running this on an otherwise-idle machine, and > with CPU frequency management disabled. It''s really, really dumb, but > it only uses "time()", not any subsecond time sourcing, for its work. > > Cheers, > StephenHi, this is cool. I was trying different calls in the ''timeme_dosomething'' procedure, is there something about sleep that would be problematic? void timeme_dosomething(void) { sleep(1); } Waiting for fresh timer tick... done. Timing noop for 10 seconds: completed 34319333 cycles in 10 seconds Timing something for 10 seconds: completed 10 cycles in 10 seconds Average time for something: 0.000000 seconds (291.381042 ns). It gets the 10 cycles in 10 seconds right, but shouldn''t the average be 1.0? Thanks, Tim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-29 12:39 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
Anthony Liguori wrote:> This may just be the difference between having the extra level of block > caching from using a loop back device. > > Try running the same benchmark on a domain that uses an actual > partition. While the syscalls may appear to be faster, I imagine it''s > because the cost of pulling in a block has already been payed so the > overall workload is unaffected.I created a new domainU by using the physical partition instead of the loopback file as the backends of the VBDs and I reran the "strace -c /bin/sh -c /bin/echo foo" benchmark inside of the domU. The following are the results. Comparing with the results for domU with loopback files as VBDs I reported in the previous email (quoted below), the average time of open/stat64 syscalls are very similar, but still much smaller than the values for standard Linux. If the reason that open/stat64 run faster on domU with loopback files as VBDs is because of the extra level of block caching from using a loop back device, why open/stat64 still run similarily faster on domU with physical partition as VBDs when there is no extra level of block caching from using a loop back device? XenLinux (physical partition as VBDs) root@cctest1:~/c# strace -c /bin/sh -c /bin/echo foo % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 39.56 0.001955 1955 1 write 18.94 0.000936 936 1 execve 7.65 0.000378 24 16 old_mmap 7.57 0.000374 42 9 2 open 6.27 0.000310 52 6 read 5.10 0.000252 84 3 munmap 4.92 0.000243 9 26 brk 1.92 0.000095 14 7 close 1.78 0.000088 8 11 rt_sigaction 1.40 0.000069 10 7 fstat64 1.01 0.000050 8 6 rt_sigprocmask 0.93 0.000046 23 2 access 0.79 0.000039 13 3 uname 0.69 0.000034 17 2 stat64 0.38 0.000019 19 1 ioctl 0.16 0.000008 8 1 getppid 0.16 0.000008 8 1 getpgrp 0.14 0.000007 7 1 time 0.14 0.000007 7 1 getuid32 0.14 0.000007 7 1 getgid32 0.12 0.000006 6 1 getpid 0.12 0.000006 6 1 getegid32 0.10 0.000005 5 1 geteuid32 ------ ----------- ----------- --------- --------- ---------------- 100.00 0.004942 109 2 total Thanks. Xuehai> xuehai zhang wrote: > >> Dear all, >> When I debugged the execution performance of an application using >> strace, I found there are some system calls like open and stat64 which >> run faster on XenLinux than the standard Linux. The following is the >> output of running "strace -c /bin/sh -c /bin/echo foo" on both >> systems. An open call runs averagely 109 usec on standard Linux but >> only 41 usecs on XenLinux. An stat64 call runs >> 75 usecs on standard Linux but only 19 usecs on XenLinux. >> The Xen VM runs on the same physical machine as the standard Linux. It >> uses loopback files in dom0 as the backends of VBDs. >> Any insight is highly appreciated. >> Thanks. >> Xuehai >> >> XenLinux: >> # strace -c /bin/sh -c /bin/echo foo >> >> % time seconds usecs/call calls errors syscall >> ------ ----------- ----------- --------- --------- ---------------- >> 39.05 0.001972 1972 1 write >> 19.35 0.000977 977 1 execve >> 7.74 0.000391 24 16 old_mmap >> 7.23 0.000365 41 9 2 open >> 6.06 0.000306 51 6 read >> 5.17 0.000261 10 26 brk >> 4.93 0.000249 83 3 munmap >> 1.98 0.000100 14 7 close >> 1.90 0.000096 9 11 rt_sigaction >> 1.52 0.000077 11 7 fstat64 >> 1.01 0.000051 9 6 rt_sigprocmask >> 0.95 0.000048 24 2 access >> 0.81 0.000041 14 3 uname >> 0.75 0.000038 19 2 stat64 >> 0.38 0.000019 19 1 ioctl >> 0.18 0.000009 9 1 time >> 0.18 0.000009 9 1 getppid >> 0.16 0.000008 8 1 getpgrp >> 0.16 0.000008 8 1 getuid32 >> 0.14 0.000007 7 1 getgid32 >> 0.12 0.000006 6 1 getpid >> 0.12 0.000006 6 1 geteuid32 >> 0.12 0.000006 6 1 getegid32 >> ------ ----------- ----------- --------- --------- ---------------- >> 100.00 0.005050 109 2 total >> >> >> Standard Linux: >> # strace -c /bin/sh -c /bin/echo foo >> >> % time seconds usecs/call calls errors syscall >> ------ ----------- ----------- --------- --------- ---------------- >> 22.90 0.000982 109 9 2 open >> 22.85 0.000980 980 1 execve >> 10.87 0.000466 29 16 old_mmap >> 10.45 0.000448 448 1 write >> 7.06 0.000303 51 6 read >> 6.67 0.000286 10 30 brk >> 3.61 0.000155 78 2 access >> 3.50 0.000150 75 2 stat64 >> 2.91 0.000125 42 3 munmap >> 2.24 0.000096 14 7 close >> 2.12 0.000091 13 7 fstat64 >> 1.84 0.000079 7 11 rt_sigaction >> 1.03 0.000044 7 6 rt_sigprocmask >> 0.72 0.000031 10 3 uname >> 0.19 0.000008 8 1 geteuid32 >> 0.16 0.000007 7 1 time >> 0.16 0.000007 7 1 getppid >> 0.16 0.000007 7 1 getpgrp >> 0.16 0.000007 7 1 getuid32 >> 0.14 0.000006 6 1 getpid >> 0.14 0.000006 6 1 getgid32 >> 0.12 0.000005 5 1 getegid32 >> ------ ----------- ----------- --------- --------- ---------------- >> 100.00 0.004289 112 2 total >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-29 12:58 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
BTW, Xen I used is compiled from the 2.0-testing source tree. The changeset shown in "xm dmesg" is: # xm dmesg | more __ __ ____ ___ _ _ _ \ \/ /___ _ __ |___ \ / _ \ | |_ ___ ___| |_(_)_ __ __ _ \ // _ \ ''_ \ __) || | | |__| __/ _ \/ __| __| | ''_ \ / _` | / \ __/ | | | / __/ | |_| |__| || __/\__ \ |_| | | | | (_| | /_/\_\___|_| |_| |_____(_)___/ \__\___||___/\__|_|_| |_|\__, | |___/ http://www.cl.cam.ac.uk/netos/xen University of Cambridge Computer Laboratory Xen version 2.0-testing (root@mcs.anl.gov) (gcc version 3.3.5 (Debian 1:3.3.5-13)) Sat Nov 26 20:01:57 CST 2005 Latest ChangeSet: Sat Aug 27 21:43:33 2005 9d3927f57bb21707d4b6f04ff2d8a4addc6f7d71 Thanks. Xuehai xuehai zhang wrote:> Anthony Liguori wrote: > >> This may just be the difference between having the extra level of >> block caching from using a loop back device. >> >> Try running the same benchmark on a domain that uses an actual >> partition. While the syscalls may appear to be faster, I imagine it''s >> because the cost of pulling in a block has already been payed so the >> overall workload is unaffected. > > > I created a new domainU by using the physical partition instead of the > loopback file as the backends of the VBDs and I reran the "strace -c > /bin/sh -c /bin/echo foo" benchmark inside of the domU. The following > are the results. Comparing with the results for domU with loopback files > as VBDs I reported in the previous email (quoted below), the average > time of open/stat64 syscalls are very similar, but still much smaller > than the values for standard Linux. If the reason that open/stat64 run > faster on domU with loopback files as VBDs is because of the extra level > of block caching from using a loop back device, why open/stat64 still > run similarily faster on domU with physical partition as VBDs when there > is no extra level of block caching from using a loop back device? > > XenLinux (physical partition as VBDs) > > root@cctest1:~/c# strace -c /bin/sh -c /bin/echo foo > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 39.56 0.001955 1955 1 write > 18.94 0.000936 936 1 execve > 7.65 0.000378 24 16 old_mmap > 7.57 0.000374 42 9 2 open > 6.27 0.000310 52 6 read > 5.10 0.000252 84 3 munmap > 4.92 0.000243 9 26 brk > 1.92 0.000095 14 7 close > 1.78 0.000088 8 11 rt_sigaction > 1.40 0.000069 10 7 fstat64 > 1.01 0.000050 8 6 rt_sigprocmask > 0.93 0.000046 23 2 access > 0.79 0.000039 13 3 uname > 0.69 0.000034 17 2 stat64 > 0.38 0.000019 19 1 ioctl > 0.16 0.000008 8 1 getppid > 0.16 0.000008 8 1 getpgrp > 0.14 0.000007 7 1 time > 0.14 0.000007 7 1 getuid32 > 0.14 0.000007 7 1 getgid32 > 0.12 0.000006 6 1 getpid > 0.12 0.000006 6 1 getegid32 > 0.10 0.000005 5 1 geteuid32 > ------ ----------- ----------- --------- --------- ---------------- > 100.00 0.004942 109 2 total > > Thanks. > > Xuehai > > >> xuehai zhang wrote: >> >>> Dear all, >>> When I debugged the execution performance of an application using >>> strace, I found there are some system calls like open and stat64 >>> which run faster on XenLinux than the standard Linux. The following >>> is the output of running "strace -c /bin/sh -c /bin/echo foo" on both >>> systems. An open call runs averagely 109 usec on standard Linux but >>> only 41 usecs on XenLinux. An stat64 call runs >>> 75 usecs on standard Linux but only 19 usecs on XenLinux. >>> The Xen VM runs on the same physical machine as the standard Linux. >>> It uses loopback files in dom0 as the backends of VBDs. >>> Any insight is highly appreciated. >>> Thanks. >>> Xuehai >>> >>> XenLinux: >>> # strace -c /bin/sh -c /bin/echo foo >>> >>> % time seconds usecs/call calls errors syscall >>> ------ ----------- ----------- --------- --------- ---------------- >>> 39.05 0.001972 1972 1 write >>> 19.35 0.000977 977 1 execve >>> 7.74 0.000391 24 16 old_mmap >>> 7.23 0.000365 41 9 2 open >>> 6.06 0.000306 51 6 read >>> 5.17 0.000261 10 26 brk >>> 4.93 0.000249 83 3 munmap >>> 1.98 0.000100 14 7 close >>> 1.90 0.000096 9 11 rt_sigaction >>> 1.52 0.000077 11 7 fstat64 >>> 1.01 0.000051 9 6 rt_sigprocmask >>> 0.95 0.000048 24 2 access >>> 0.81 0.000041 14 3 uname >>> 0.75 0.000038 19 2 stat64 >>> 0.38 0.000019 19 1 ioctl >>> 0.18 0.000009 9 1 time >>> 0.18 0.000009 9 1 getppid >>> 0.16 0.000008 8 1 getpgrp >>> 0.16 0.000008 8 1 getuid32 >>> 0.14 0.000007 7 1 getgid32 >>> 0.12 0.000006 6 1 getpid >>> 0.12 0.000006 6 1 geteuid32 >>> 0.12 0.000006 6 1 getegid32 >>> ------ ----------- ----------- --------- --------- ---------------- >>> 100.00 0.005050 109 2 total >>> >>> >>> Standard Linux: >>> # strace -c /bin/sh -c /bin/echo foo >>> >>> % time seconds usecs/call calls errors syscall >>> ------ ----------- ----------- --------- --------- ---------------- >>> 22.90 0.000982 109 9 2 open >>> 22.85 0.000980 980 1 execve >>> 10.87 0.000466 29 16 old_mmap >>> 10.45 0.000448 448 1 write >>> 7.06 0.000303 51 6 read >>> 6.67 0.000286 10 30 brk >>> 3.61 0.000155 78 2 access >>> 3.50 0.000150 75 2 stat64 >>> 2.91 0.000125 42 3 munmap >>> 2.24 0.000096 14 7 close >>> 2.12 0.000091 13 7 fstat64 >>> 1.84 0.000079 7 11 rt_sigaction >>> 1.03 0.000044 7 6 rt_sigprocmask >>> 0.72 0.000031 10 3 uname >>> 0.19 0.000008 8 1 geteuid32 >>> 0.16 0.000007 7 1 time >>> 0.16 0.000007 7 1 getppid >>> 0.16 0.000007 7 1 getpgrp >>> 0.16 0.000007 7 1 getuid32 >>> 0.14 0.000006 6 1 getpid >>> 0.14 0.000006 6 1 getgid32 >>> 0.12 0.000005 5 1 getegid32 >>> ------ ----------- ----------- --------- --------- ---------------- >>> 100.00 0.004289 112 2 total >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >> >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2005-Nov-29 12:59 UTC
RE: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > xuehai zhang > Sent: 29 November 2005 12:40 > To: Anthony Liguori > Cc: Kate Keahey; Tim Freeman; Xen Mailing List > Subject: Re: [Xen-devel] open/stat64 syscalls run faster on > Xen VM than standard Linux > > Anthony Liguori wrote: > > This may just be the difference between having the extra level of > > block caching from using a loop back device. > > > > Try running the same benchmark on a domain that uses an actual > > partition. While the syscalls may appear to be faster, I > imagine it''s > > because the cost of pulling in a block has already been > payed so the > > overall workload is unaffected. > > I created a new domainU by using the physical partition > instead of the loopback file as the backends of the VBDs and > I reran the "strace -c /bin/sh -c /bin/echo foo" benchmark > inside of the domU. The following are the results. Comparing > with the results for domU with loopback files as VBDs I > reported in the previous email (quoted below), the average > time of open/stat64 syscalls are very similar, but still much > smaller than the values for standard Linux. If the reason > that open/stat64 run faster on domU with loopback files as > VBDs is because of the extra level of block caching from > using a loop back device, why open/stat64 still run > similarily faster on domU with physical partition as VBDs > when there is no extra level of block caching from using a > loop back device?Again, have you run the same test-case on the same physical partition? You should be able to do the same trick with mount and chroot for this one too. This is just to rule out other variations (where in the file-system different files live, just as an example). I''m also not entirely convinced that you don''t get some double-caching with this method too - there is the real hard-disk driver in Dom0, then the virtual disk driver in DomU, and the file-system driver in DomU. I don''t know if the real or virtual block-device driver itself will cache things or not. Someone who knows may want to comment on this... Still, I agree with Anthony - run some REAL loads on the system, rather than microbenchmark a particular, very small function of the whole system. Microbenchmarks are very good at showing how long some particular thing takes, but they are also very sensitive to extremely small changes in the system architecture, which for a bigger load would not be noticable. I''m doing my best to explain things here, but I''m not sure if I''m getting the point across, as it seems like you''re trying very hard to get back to the same original question. We do know that Xen behaves different than Linux. One reason for this is that it''s dealing with hardware devices through a virtual device. This will most of the time slow things down, but as you''ve seen, for example due to the double-caching effect, you may sometimes actually gain from it, particularly with very small workloads. -- Mats> > XenLinux (physical partition as VBDs) > > root@cctest1:~/c# strace -c /bin/sh -c /bin/echo foo > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 39.56 0.001955 1955 1 write > 18.94 0.000936 936 1 execve > 7.65 0.000378 24 16 old_mmap > 7.57 0.000374 42 9 2 open > 6.27 0.000310 52 6 read > 5.10 0.000252 84 3 munmap > 4.92 0.000243 9 26 brk > 1.92 0.000095 14 7 close > 1.78 0.000088 8 11 rt_sigaction > 1.40 0.000069 10 7 fstat64 > 1.01 0.000050 8 6 rt_sigprocmask > 0.93 0.000046 23 2 access > 0.79 0.000039 13 3 uname > 0.69 0.000034 17 2 stat64 > 0.38 0.000019 19 1 ioctl > 0.16 0.000008 8 1 getppid > 0.16 0.000008 8 1 getpgrp > 0.14 0.000007 7 1 time > 0.14 0.000007 7 1 getuid32 > 0.14 0.000007 7 1 getgid32 > 0.12 0.000006 6 1 getpid > 0.12 0.000006 6 1 getegid32 > 0.10 0.000005 5 1 geteuid32 > ------ ----------- ----------- --------- --------- ---------------- > 100.00 0.004942 109 2 total > > Thanks. > > Xuehai > > > > xuehai zhang wrote: > > > >> Dear all, > >> When I debugged the execution performance of an application using > >> strace, I found there are some system calls like open and stat64 > >> which run faster on XenLinux than the standard Linux. The > following > >> is the output of running "strace -c /bin/sh -c /bin/echo > foo" on both > >> systems. An open call runs averagely 109 usec on standard > Linux but > >> only 41 usecs on XenLinux. An stat64 call runs > >> 75 usecs on standard Linux but only 19 usecs on XenLinux. > >> The Xen VM runs on the same physical machine as the > standard Linux. > >> It uses loopback files in dom0 as the backends of VBDs. > >> Any insight is highly appreciated. > >> Thanks. > >> Xuehai > >> > >> XenLinux: > >> # strace -c /bin/sh -c /bin/echo foo > >> > >> % time seconds usecs/call calls errors syscall > >> ------ ----------- ----------- --------- --------- ---------------- > >> 39.05 0.001972 1972 1 write > >> 19.35 0.000977 977 1 execve > >> 7.74 0.000391 24 16 old_mmap > >> 7.23 0.000365 41 9 2 open > >> 6.06 0.000306 51 6 read > >> 5.17 0.000261 10 26 brk > >> 4.93 0.000249 83 3 munmap > >> 1.98 0.000100 14 7 close > >> 1.90 0.000096 9 11 rt_sigaction > >> 1.52 0.000077 11 7 fstat64 > >> 1.01 0.000051 9 6 rt_sigprocmask > >> 0.95 0.000048 24 2 access > >> 0.81 0.000041 14 3 uname > >> 0.75 0.000038 19 2 stat64 > >> 0.38 0.000019 19 1 ioctl > >> 0.18 0.000009 9 1 time > >> 0.18 0.000009 9 1 getppid > >> 0.16 0.000008 8 1 getpgrp > >> 0.16 0.000008 8 1 getuid32 > >> 0.14 0.000007 7 1 getgid32 > >> 0.12 0.000006 6 1 getpid > >> 0.12 0.000006 6 1 geteuid32 > >> 0.12 0.000006 6 1 getegid32 > >> ------ ----------- ----------- --------- --------- ---------------- > >> 100.00 0.005050 109 2 total > >> > >> > >> Standard Linux: > >> # strace -c /bin/sh -c /bin/echo foo > >> > >> % time seconds usecs/call calls errors syscall > >> ------ ----------- ----------- --------- --------- ---------------- > >> 22.90 0.000982 109 9 2 open > >> 22.85 0.000980 980 1 execve > >> 10.87 0.000466 29 16 old_mmap > >> 10.45 0.000448 448 1 write > >> 7.06 0.000303 51 6 read > >> 6.67 0.000286 10 30 brk > >> 3.61 0.000155 78 2 access > >> 3.50 0.000150 75 2 stat64 > >> 2.91 0.000125 42 3 munmap > >> 2.24 0.000096 14 7 close > >> 2.12 0.000091 13 7 fstat64 > >> 1.84 0.000079 7 11 rt_sigaction > >> 1.03 0.000044 7 6 rt_sigprocmask > >> 0.72 0.000031 10 3 uname > >> 0.19 0.000008 8 1 geteuid32 > >> 0.16 0.000007 7 1 time > >> 0.16 0.000007 7 1 getppid > >> 0.16 0.000007 7 1 getpgrp > >> 0.16 0.000007 7 1 getuid32 > >> 0.14 0.000006 6 1 getpid > >> 0.14 0.000006 6 1 getgid32 > >> 0.12 0.000005 5 1 getegid32 > >> ------ ----------- ----------- --------- --------- ---------------- > >> 100.00 0.004289 112 2 total > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel > >> > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-29 13:27 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
>>Your questioning makes sense to me. But I am not very sure about how to effectively count how many >>these operations can be completed in a second. Can you give me some hint? > > > Here''s a quick-and-dirty wrapper for timing "something" over the space > of a few seconds (default 10) and working out how fast it went. > Obviously, you want to be running this on an otherwise-idle machine, and > with CPU frequency management disabled. It''s really, really dumb, but > it only uses "time()", not any subsecond time sourcing, for its work.Stephen, Thanks for prioviding the code. I have one question about the calculation of "how_long" (average time of tested system call in the timeme_dosomething()). In your original code, it is calculated as: how_long = 1.0 / ((float) (loops_noop - loops_something) / (TIME_SECS)); I think it should be how_long = (float)TIME_SECS /loops_something - (float)TIME_SECS/loops_noop; Plese correct me if I am wrong. I applied the above modification to your code (I named it Tick.c) and ran the Tick binary on 3 platforms (physical machine, domU with loopback file as VBDs, domU with hard disk partition as VBDs). I put together the results and discussion in the following PDF doc: http://people.cs.uchicago.edu/~hai/tmp/tick_performance.pdf "The average stat64 system call time reported by the corrected tick binary now accords with the time reported by strace. It runs faster on both PVM and LVM than PHY. The PVM is the platform it runs fastest." Thanks. Xuehai> ------------------------------------------------------------------------ > > #include <unistd.h> > #include <stdlib.h> > #include <stdio.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <sys/time.h> > > #define TIME_SECS 10 > > static inline int time_is_before(time_t a, time_t b) > { > return ((signed) (a-b)) < 0; > } > > void timeme_noop(void) > { > return; > } > > void timeme_dosomething(void) > { > struct stat statbuf; > stat("/", &statbuf); > } > > int run_time_loop(const char *what, void (*fn)(void)) > { > time_t end; > int count=0; > > end = time(NULL) + TIME_SECS; > > printf("Timing %s for %d seconds: ", what, TIME_SECS); > fflush(stdout); > do { > ++count; > fn(); > } while (time_is_before(time(NULL), end)); > printf("completed %d cycles in %d seconds\n", count, TIME_SECS); > return count; > } > > void wait_for_tick(void) > { > time_t end; > printf("Waiting for fresh timer tick..."); > fflush(stdout); > end = time(NULL) + 1; > while (time_is_before(time(NULL), end)); > printf(" done.\n"); > } > > int main(void) > { > int loops_noop, loops_something; > float how_long; > > wait_for_tick(); > loops_noop = run_time_loop("noop", timeme_noop); > loops_something = run_time_loop("something", timeme_dosomething); > > how_long = 1.0 / ((float) (loops_noop - loops_something) / (TIME_SECS));I wonder if this line should be: how_long = (float)TIME_SECS /loops_something - (float)TIME_SECS/loops_noop;> printf ("Average time for something: %f seconds (%f ns).\n", > how_long, how_long * 1000000000.0); > } >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-29 13:48 UTC
Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM than standard Linux
Mats, Thank you so much for your help.>>-----Original Message----- >>From: xen-devel-bounces@lists.xensource.com >>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of >>xuehai zhang >>Sent: 29 November 2005 12:40 >>To: Anthony Liguori >>Cc: Kate Keahey; Tim Freeman; Xen Mailing List >>Subject: Re: [Xen-devel] open/stat64 syscalls run faster on >>Xen VM than standard Linux >> >>Anthony Liguori wrote: >> >>>This may just be the difference between having the extra level of >>>block caching from using a loop back device. >>> >>>Try running the same benchmark on a domain that uses an actual >>>partition. While the syscalls may appear to be faster, I >> >>imagine it''s >> >>>because the cost of pulling in a block has already been >> >>payed so the >> >>>overall workload is unaffected. >> >>I created a new domainU by using the physical partition >>instead of the loopback file as the backends of the VBDs and >>I reran the "strace -c /bin/sh -c /bin/echo foo" benchmark >>inside of the domU. The following are the results. Comparing >>with the results for domU with loopback files as VBDs I >>reported in the previous email (quoted below), the average >>time of open/stat64 syscalls are very similar, but still much >>smaller than the values for standard Linux. If the reason >>that open/stat64 run faster on domU with loopback files as >>VBDs is because of the extra level of block caching from >>using a loop back device, why open/stat64 still run >>similarily faster on domU with physical partition as VBDs >>when there is no extra level of block caching from using a >>loop back device? > > > Again, have you run the same test-case on the same physical partition? > You should be able to do the same trick with mount and chroot for this > one too.Yes, I did. I shutdown the partition-based domU first, then mount the partition and chroot to it, and redid "strace -c /bin/sh -c /bin/echo foo". The following results show the write cost is similar as the value of physical machine while open and stat64 are still similar as values of loopback file-based domU. # strace -c /bin/sh -c /bin/echo foo % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 21.17 0.000482 482 1 write 16.07 0.000366 24 15 old_mmap 14.54 0.000331 37 9 3 open 11.55 0.000263 10 26 brk 8.83 0.000201 40 5 read 7.47 0.000170 85 2 munmap 4.04 0.000092 8 11 rt_sigaction 3.16 0.000072 12 6 close 2.85 0.000065 11 6 fstat64 2.20 0.000050 25 2 access 2.20 0.000050 8 6 rt_sigprocmask 1.80 0.000041 14 3 uname 1.27 0.000029 15 2 stat64 0.35 0.000008 8 1 time 0.35 0.000008 8 1 getppid 0.35 0.000008 8 1 getpgrp 0.35 0.000008 8 1 getuid32 0.35 0.000008 8 1 getgid32 0.31 0.000007 7 1 getpid 0.26 0.000006 6 1 execve 0.26 0.000006 6 1 geteuid32 0.26 0.000006 6 1 getegid32 ------ ----------- ----------- --------- --------- ---------------- 100.00 0.002277 103 3 total Some more information: The partition I use is /dev/sda6, which is mapped to /tmp in dom0. # mount /tmp # df -lh Filesystem Size Used Avail Use% Mounted on /dev/sda1 1.5G 728M 661M 53% / tmpfs 62M 4.0K 62M 1% /dev/shm /dev/sda5 938M 250M 640M 29% /var /dev/sda6 4.2G 2.9G 1.2G 72% /tmp # ls /tmp bcfg-run.nightly dbuta/ initrd/ media/ mpd2.console_root= opt/ sbin/ usr/ beckman/ dev/ lib/ mnt/ mpd2.logfile_rbutler proc/ srv/ var/ bin/ etc/ lost+found/ mpd2/ mpd2.logfile_robl root/ sys/ vosg/ boot/ home/ mdtest* mpd2.console_robl= mpd2.logfile_root sandbox/ tmp/ zoid_threads/> This is just to rule out other variations (where in the file-system > different files live, just as an example). > > I''m also not entirely convinced that you don''t get some double-caching > with this method too - there is the real hard-disk driver in Dom0, then > the virtual disk driver in DomU, and the file-system driver in DomU. I > don''t know if the real or virtual block-device driver itself will cache > things or not. Someone who knows may want to comment on this...I would like to know the caching mechanism of real or virtual block-device driver too.> Still, I agree with Anthony - run some REAL loads on the system, rather > than microbenchmark a particular, very small function of the whole > system. Microbenchmarks are very good at showing how long some > particular thing takes, but they are also very sensitive to extremely > small changes in the system architecture, which for a bigger load would > not be noticable.> I''m doing my best to explain things here, but I''m not sure if I''m > getting the point across, as it seems like you''re trying very hard to > get back to the same original question. We do know that Xen behaves > different than Linux. One reason for this is that it''s dealing with > hardware devices through a virtual device. This will most of the time > slow things down, but as you''ve seen, for example due to the > double-caching effect, you may sometimes actually gain from it, > particularly with very small workloads.I am sorry that I didn''t describe before about why I am doing all these microbechmarks. The reason behind is actually coming from the fact that the VM outperforms the physical machine when I ran some real application on both of them originally (VM and physical machine are configured as similar as possible). We all expect that the Xen VM presents a little bit worse performance than the compatible physical machine. So, the performance "gain" on VM to run the same application is really an abnormal observation, which has been made me worried a lot in the past week. That is why I spent so much time tracing down what happens in the application until the level of system calls. So, the problem is original from running the actual application. Thanks. Xuehai>>XenLinux (physical partition as VBDs) >> >>root@cctest1:~/c# strace -c /bin/sh -c /bin/echo foo >>% time seconds usecs/call calls errors syscall >>------ ----------- ----------- --------- --------- ---------------- >> 39.56 0.001955 1955 1 write >> 18.94 0.000936 936 1 execve >> 7.65 0.000378 24 16 old_mmap >> 7.57 0.000374 42 9 2 open >> 6.27 0.000310 52 6 read >> 5.10 0.000252 84 3 munmap >> 4.92 0.000243 9 26 brk >> 1.92 0.000095 14 7 close >> 1.78 0.000088 8 11 rt_sigaction >> 1.40 0.000069 10 7 fstat64 >> 1.01 0.000050 8 6 rt_sigprocmask >> 0.93 0.000046 23 2 access >> 0.79 0.000039 13 3 uname >> 0.69 0.000034 17 2 stat64 >> 0.38 0.000019 19 1 ioctl >> 0.16 0.000008 8 1 getppid >> 0.16 0.000008 8 1 getpgrp >> 0.14 0.000007 7 1 time >> 0.14 0.000007 7 1 getuid32 >> 0.14 0.000007 7 1 getgid32 >> 0.12 0.000006 6 1 getpid >> 0.12 0.000006 6 1 getegid32 >> 0.10 0.000005 5 1 geteuid32 >>------ ----------- ----------- --------- --------- ---------------- >>100.00 0.004942 109 2 total >> >>Thanks. >> >>Xuehai >> >> >> >>>xuehai zhang wrote: >>> >>> >>>>Dear all, >>>>When I debugged the execution performance of an application using >>>>strace, I found there are some system calls like open and stat64 >>>>which run faster on XenLinux than the standard Linux. The >> >>following >> >>>>is the output of running "strace -c /bin/sh -c /bin/echo >> >>foo" on both >> >>>>systems. An open call runs averagely 109 usec on standard >> >>Linux but >> >>>>only 41 usecs on XenLinux. An stat64 call runs >>>>75 usecs on standard Linux but only 19 usecs on XenLinux. >>>>The Xen VM runs on the same physical machine as the >> >>standard Linux. >> >>>>It uses loopback files in dom0 as the backends of VBDs. >>>>Any insight is highly appreciated. >>>>Thanks. >>>>Xuehai >>>> >>>>XenLinux: >>>># strace -c /bin/sh -c /bin/echo foo >>>> >>>>% time seconds usecs/call calls errors syscall >>>>------ ----------- ----------- --------- --------- ---------------- >>>> 39.05 0.001972 1972 1 write >>>> 19.35 0.000977 977 1 execve >>>> 7.74 0.000391 24 16 old_mmap >>>> 7.23 0.000365 41 9 2 open >>>> 6.06 0.000306 51 6 read >>>> 5.17 0.000261 10 26 brk >>>> 4.93 0.000249 83 3 munmap >>>> 1.98 0.000100 14 7 close >>>> 1.90 0.000096 9 11 rt_sigaction >>>> 1.52 0.000077 11 7 fstat64 >>>> 1.01 0.000051 9 6 rt_sigprocmask >>>> 0.95 0.000048 24 2 access >>>> 0.81 0.000041 14 3 uname >>>> 0.75 0.000038 19 2 stat64 >>>> 0.38 0.000019 19 1 ioctl >>>> 0.18 0.000009 9 1 time >>>> 0.18 0.000009 9 1 getppid >>>> 0.16 0.000008 8 1 getpgrp >>>> 0.16 0.000008 8 1 getuid32 >>>> 0.14 0.000007 7 1 getgid32 >>>> 0.12 0.000006 6 1 getpid >>>> 0.12 0.000006 6 1 geteuid32 >>>> 0.12 0.000006 6 1 getegid32 >>>>------ ----------- ----------- --------- --------- ---------------- >>>>100.00 0.005050 109 2 total >>>> >>>> >>>>Standard Linux: >>>># strace -c /bin/sh -c /bin/echo foo >>>> >>>>% time seconds usecs/call calls errors syscall >>>>------ ----------- ----------- --------- --------- ---------------- >>>> 22.90 0.000982 109 9 2 open >>>> 22.85 0.000980 980 1 execve >>>> 10.87 0.000466 29 16 old_mmap >>>> 10.45 0.000448 448 1 write >>>> 7.06 0.000303 51 6 read >>>> 6.67 0.000286 10 30 brk >>>> 3.61 0.000155 78 2 access >>>> 3.50 0.000150 75 2 stat64 >>>> 2.91 0.000125 42 3 munmap >>>> 2.24 0.000096 14 7 close >>>> 2.12 0.000091 13 7 fstat64 >>>> 1.84 0.000079 7 11 rt_sigaction >>>> 1.03 0.000044 7 6 rt_sigprocmask >>>> 0.72 0.000031 10 3 uname >>>> 0.19 0.000008 8 1 geteuid32 >>>> 0.16 0.000007 7 1 time >>>> 0.16 0.000007 7 1 getppid >>>> 0.16 0.000007 7 1 getpgrp >>>> 0.16 0.000007 7 1 getuid32 >>>> 0.14 0.000006 6 1 getpid >>>> 0.14 0.000006 6 1 getgid32 >>>> 0.12 0.000005 5 1 getegid32 >>>>------ ----------- ----------- --------- --------- ---------------- >>>>100.00 0.004289 112 2 total >>>> >>>>_______________________________________________ >>>>Xen-devel mailing list >>>>Xen-devel@lists.xensource.com >>>>http://lists.xensource.com/xen-devel >>>> >>> >>> >> >>_______________________________________________ >>Xen-devel mailing list >>Xen-devel@lists.xensource.com >>http://lists.xensource.com/xen-devel >> >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel