xuehai zhang
2005-Nov-23 20:25 UTC
[Xen-devel] a question about popen() performance on domU
Dear all, When I compared the performance of some application on both a Xen domU and a standard linux machine (where domU runs on a similar physical mahine), I notice the application runs faster on the domU than on the physical machine. Instrumenting the application code shows the application spends more time on popen() calls on domU than on the physical machine. I wonder if xenlinux does some special modification of the popen code to improve its performance than the original Linux popen code? Thanks in advance for your help. Xuehai _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2005-Nov-24 10:07 UTC
RE: [Xen-devel] a question about popen() performance on domU
I did have a look at popen, and essentially, it does the following [ the real code is MUCH more complicated, doing lots of open/dup/close on pipes and stuff]: if (!fork()) exec("/bin/sh", "sh", "-c", cmd, NULL); The fork creates another process, which then executes the /bin/sh, which again causes another fork/exec to take place in the effort of executing the actual command given. So the major component of popen would be fork() and execl(), both of which cause, amongst other things, a lot of page-table work and task-switching. Note that popen is implemented in glibc [I took the 2.3.6 source code from www.gnu.org for my look at this], so there''s no difference in the implementation of popen itself - the difference lies in how the Linux kernel handles fork() and exec(), but maybe more importantly, how task-switches and page-tables are handled in Linux native and Xen-Linux. Because Xen keeps track of the page-tables on top of Linux''s handling of page-tables, you get some extra work here. So, it should really be slower on Xen than on native Linux. [In fact, the question came up not so long ago, why Xen was SLOWER than native Linux on popen (and some others) in a particular benchmark, and the result of that investigation was that it''s down to, mainly, task-switching takes longer in Xen.] The reason it is not would probably have something to do with the differences in hardware on Linux vs. Xen platforms, perhaps the fact that your file-system is a virtual block-device and thus lives inside a file that is perhaps better cached or otherwise handled in a different way on the Xen-system. Now, I''m not saying that there isn''t a possibility that something is managed differently in Xen that makes this run faster - I just don''t really see how that would be likely, since everything that happens in the system is going to be MORE complicated by the extra layer of Xen involved. If anyone else has some thoughts on this subject, it would be interesting to hear. -- Mats> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > xuehai zhang > Sent: 23 November 2005 20:26 > To: xen-devel@lists.xensource.com > Cc: Tim Freeman; Kate Keahey > Subject: [Xen-devel] a question about popen() performance on domU > > Dear all, > When I compared the performance of some application on both a > Xen domU and a standard linux machine (where domU runs on a > similar physical mahine), I notice the application runs > faster on the domU than on the physical machine. > Instrumenting the application code shows the application > spends more time on popen() calls on domU than on the > physical machine. I wonder if xenlinux does some special > modification of the popen code to improve its performance > than the original Linux popen code? > Thanks in advance for your help. > Xuehai > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-24 14:02 UTC
Re: [Xen-devel] a question about popen() performance on domU
Mats, Thanks a lot for the response.> I did have a look at popen, and essentially, it does the following [ the > real code is MUCH more complicated, doing lots of open/dup/close on > pipes and stuff]: > if (!fork()) > exec("/bin/sh", "sh", "-c", cmd, NULL);I took a look at the popen source code too yesterday and the above lines are the esstential part. A thread at gnu list (http://lists.gnu.org/archive/html/bug-global/2005-06/msg00001.html) suggets popen() might depend on how fast /bin/sh is executed. On both my VM and the physical machine, the kernel version is 2.6.11, glibc version is 2.3.2.ds1-21, and /bin/sh is linked to /bin/bash. I also tried to see any difference of the shared libraries used by /bin/sh on both machines and found /bin/sh on the physical machine uses libraries from /lib/tls while for the VM this directory is disabled. VM$ ldd /bin/sh libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa7000) libdl.so.2 => /lib/libdl.so.2 (0xb7fa3000) libc.so.6 => /lib/libc.so.6 (0xb7e70000) /lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000) PHYSICAL$ ldd /bin/sh libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa6000) libdl.so.2 => /lib/tls/libdl.so.2 (0xb7fa2000) libc.so.6 => /lib/tls/libc.so.6 (0xb7e6d000) /lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000)> The fork creates another process, which then executes the /bin/sh, which > again causes another fork/exec to take place in the effort of executing > the actual command given. > > So the major component of popen would be fork() and execl(), both of > which cause, amongst other things, a lot of page-table work and > task-switching. > > Note that popen is implemented in glibc [I took the 2.3.6 source code > from www.gnu.org for my look at this], so there''s no difference in the > implementation of popen itself - the difference lies in how the Linux > kernel handles fork() and exec(), but maybe more importantly, how > task-switches and page-tables are handled in Linux native and Xen-Linux. > Because Xen keeps track of the page-tables on top of Linux''s handling of > page-tables, you get some extra work here. So, it should really be > slower on Xen than on native Linux. > [In fact, the question came up not so long ago, why Xen was SLOWER than > native Linux on popen (and some others) in a particular benchmark, and > the result of that investigation was that it''s down to, mainly, > task-switching takes longer in Xen.]I agree with your explanation about Xen was SLOWER than native Linux on popen because of the longer task-switching in Xen. The problem I met (popen runs faster on Xen VM than the physical machine) looks abnormal. I ran several home-made benchmarking programming and used the "strace" tool to trace the system call performance. The first program is to test the performance of both popen and pclose (a loop of popen call with a followup pclose call) and the source of the program and the strace results are available at http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-popen/strace.txt. The results shows the waitpid syscall costs more time on physical machine than on the VM (see the usecs/call valuee in the following table). % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- VM: 63.43 0.127900 6395 20 waitpid PHYSICAL MACHINE: 93.87 0.532498 26625 20 waitpid waitpid is called by pclose as shown in the glibc source code. So, my original post questioning the performance of popen should take pclose into consideration too. A more accurate question I should post is, popen+pclose executes faster on my VM than my physical machine. The popen/pclose benchmark I did narrows the problem down to waitpid that waitpid somehow is suffering on the physical machine. So, I did a followup experiment to test the fork and waitpid performance on both machines. The program is a loop of fork call with a followup waitpid call. The source of the program and the strace results are available at http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-fork/strace.txt. The strace results confirm the waitpid costs more time on the physical machine (154 usec/call) than the VM (56 usec/call). However, the program runs faster on the physical machine (not like the popen/pclose program) and the results suggest the fork syscall used on the VM costs more time than the clone syscall on the physical machine. I have a question here, why the physical machine doesn''t use fork syscall but the clone syscall for the same program?> The reason it is not would probably have something to do with the > differences in hardware on Linux vs. Xen platforms, perhaps the fact > that your file-system is a virtual block-device and thus lives inside a > file that is perhaps better cached or otherwise handled in a different > way on the Xen-system.Let me describe the hardware context of my VM and physical machine. The host of my VM and the physical machine I tested against the VM, are two nodes of a physical cluster with the same hardware configuration (Dual Intel PIII 498.799 MHz CPU, 512MB memory, a 4GB HD with same partitions). The physical machine is rebooted with "nosmp". The VM host is rebooted into Xen with "nosmp" (Xen version information is "Latest ChangeSet: 2005/05/03 17:30:40 1.1846 4277a730mvnFSFXrxJpVRNk8hjD4Vg"). Xen dom0 is assigned 96MB memory and the VM is the only user domain running on the VM host with 395MB memory. Both dom0 and the VM are pinned to CPU 0. Yes, the backends of the VM''s VBDs are loopback files in dom0. Three loopback files are used to map to three partitions inside of the VM. I acutally thought about the possible caching effect of the VM''s VBD backends, but not sure how to testify it and compare it with the physical machine. Is it possible the Xen has different assurance of writing back than the physical machine, that is, the data is kept in memory longer before is actually written to disk?> Now, I''m not saying that there isn''t a possibility that something is > managed differently in Xen that makes this run faster - I just don''t > really see how that would be likely, since everything that happens in > the system is going to be MORE complicated by the extra layer of Xen > involved.> If anyone else has some thoughts on this subject, it would be > interesting to hear.I agree. But given the VM having same hardware/software configuration as the physical machine, it runs faster still looks abnormal to me. I wonder if there is any other more efficient debugging strategies I can use to investigate it. I appreciate if any one has any more suggestions. Thanks again. Xuehai> >>-----Original Message----- >>From: xen-devel-bounces@lists.xensource.com >>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of >>xuehai zhang >>Sent: 23 November 2005 20:26 >>To: xen-devel@lists.xensource.com >>Cc: Tim Freeman; Kate Keahey >>Subject: [Xen-devel] a question about popen() performance on domU >> >>Dear all, >>When I compared the performance of some application on both a >>Xen domU and a standard linux machine (where domU runs on a >>similar physical mahine), I notice the application runs >>faster on the domU than on the physical machine. >>Instrumenting the application code shows the application >>spends more time on popen() calls on domU than on the >>physical machine. I wonder if xenlinux does some special >>modification of the popen code to improve its performance >>than the original Linux popen code? >>Thanks in advance for your help. >>Xuehai >> >>_______________________________________________ >>Xen-devel mailing list >>Xen-devel@lists.xensource.com >>http://lists.xensource.com/xen-devel >> >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2005-Nov-24 14:47 UTC
RE: [Xen-devel] a question about popen() performance on domU
See comments below.> -----Original Message----- > From: xuehai zhang [mailto:hai@cs.uchicago.edu] > Sent: 24 November 2005 14:02 > To: Petersson, Mats > Cc: xen-devel@lists.xensource.com; Tim Freeman; Kate Keahey > Subject: Re: [Xen-devel] a question about popen() performance on domU > > Mats, > > Thanks a lot for the response. > > > I did have a look at popen, and essentially, it does the > following [ > > the real code is MUCH more complicated, doing lots of > open/dup/close > > on pipes and stuff]: > > if (!fork()) > > exec("/bin/sh", "sh", "-c", cmd, NULL); > > I took a look at the popen source code too yesterday and the > above lines are the esstential part. A thread at gnu list > (http://lists.gnu.org/archive/html/bug-global/2005-06/msg00001 > .html) suggets > popen() might depend on how fast /bin/sh is executed. On both > my VM and the physical machine, the kernel version is 2.6.11, > glibc version is 2.3.2.ds1-21, and /bin/sh is linked to > /bin/bash. I also tried to see any difference of the shared > libraries used by /bin/sh on both machines and found /bin/sh > on the physical machine uses libraries from /lib/tls while > for the VM this directory is disabled. > > VM$ ldd /bin/sh > libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa7000) > libdl.so.2 => /lib/libdl.so.2 (0xb7fa3000) > libc.so.6 => /lib/libc.so.6 (0xb7e70000) > /lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000) > > PHYSICAL$ ldd /bin/sh > libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa6000) > libdl.so.2 => /lib/tls/libdl.so.2 (0xb7fa2000) > libc.so.6 => /lib/tls/libc.so.6 (0xb7e6d000) > /lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000)In this particular case, I would think that lib/tls is not a factor, but it may be worth disabling the tls libraries on the pysical machine too, just to make sure... [just "mv /lib/tls /lib/tls.disabled" should do it].> > > The fork creates another process, which then executes the /bin/sh, > > which again causes another fork/exec to take place in the effort of > > executing the actual command given. > > > > So the major component of popen would be fork() and > execl(), both of > > which cause, amongst other things, a lot of page-table work and > > task-switching. > > > > Note that popen is implemented in glibc [I took the 2.3.6 > source code > > from www.gnu.org for my look at this], so there''s no > difference in the > > implementation of popen itself - the difference lies in how > the Linux > > kernel handles fork() and exec(), but maybe more importantly, how > > task-switches and page-tables are handled in Linux native > and Xen-Linux. > > Because Xen keeps track of the page-tables on top of > Linux''s handling > > of page-tables, you get some extra work here. So, it should > really be > > slower on Xen than on native Linux. > > [In fact, the question came up not so long ago, why Xen was SLOWER > > than native Linux on popen (and some others) in a particular > > benchmark, and the result of that investigation was that > it''s down to, > > mainly, task-switching takes longer in Xen.] > > I agree with your explanation about Xen was SLOWER than > native Linux on popen because of the longer task-switching in > Xen. The problem I met (popen runs faster on Xen VM than the > physical machine) looks abnormal. I ran several home-made > benchmarking programming and used the "strace" tool to trace > the system call performance. The first program is to test the > performance of both popen and pclose (a loop of popen call > with a followup pclose call) and the source of the program > and the strace results are available at > http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-popen/st > race.txt. The results shows the waitpid syscall costs more > time on physical machine than on the VM (see the usecs/call > valuee in the following table). > > % time seconds usecs/call calls > errors syscall > ------ ----------- ----------- --------- > --------- ---------------- > VM: 63.43 0.127900 6395 20 > waitpid > PHYSICAL > MACHINE: 93.87 0.532498 26625 20 > waitpid > > waitpid is called by pclose as shown in the glibc source > code. So, my original post questioning the performance of > popen should take pclose into consideration too. A more > accurate question I should post is, popen+pclose executes > faster on my VM than my physical machine. The popen/pclose > benchmark I did narrows the problem down to waitpid that > waitpid somehow is suffering on the physical machine. > So, I did a followup experiment to test the fork and waitpid > performance on both machines. The program is a loop of fork > call with a followup waitpid call. The source of the program > and the strace results are available at > http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-fork/str > ace.txt. The strace results confirm the waitpid costs more > time on the physical machine (154 usec/call) than the VM (56 > usec/call). > However, the program runs faster on the physical machine (not > like the popen/pclose program) and the results suggest the > fork syscall used on the VM costs more time than the clone > syscall on the physical machine. I have a question here, why > the physical machine doesn''t use fork syscall but the clone > syscall for the same program?Because it''s using the same source for glibc! glibc says to use _IO_fork(), which is calling the fork syscall. Clone would probably do the same thing, but for whatever good or bad reason, the author(s) of thise code chose to use fork. There may be good reasons, or no reason at all to do it this way. I couldn''t say. I don''t think it makes a whole lot of difference if the actual command executed by popen is actually "doing something", rather than just an empty "return".> > > The reason it is not would probably have something to do with the > > differences in hardware on Linux vs. Xen platforms, perhaps > the fact > > that your file-system is a virtual block-device and thus > lives inside > > a file that is perhaps better cached or otherwise handled in a > > different way on the Xen-system. > > Let me describe the hardware context of my VM and physical > machine. The host of my VM and the physical machine I tested > against the VM, are two nodes of a physical cluster with the > same hardware configuration (Dual Intel PIII 498.799 MHz CPU, > 512MB memory, a 4GB HD with same partitions). The physical > machine is rebooted with "nosmp". The VM host is rebooted > into Xen with "nosmp" (Xen version information is "Latest > ChangeSet: 2005/05/03 17:30:40 1.1846 > 4277a730mvnFSFXrxJpVRNk8hjD4Vg"). Xen dom0 is assigned 96MB > memory and the VM is the only user domain running on the VM > host with 395MB memory. Both dom0 and the VM are pinned to CPU 0. > > Yes, the backends of the VM''s VBDs are loopback files in > dom0. Three loopback files are used to map to three > partitions inside of the VM. I acutally thought about the > possible caching effect of the VM''s VBD backends, but not > sure how to testify it and compare it with the physical > machine. Is it possible the Xen has different assurance of > writing back than the physical machine, that is, the data is > kept in memory longer before is actually written to disk?Xen itself doesn''t know ANYTHING about the disk/file where the data for the Dom0 or DomU comes from, so no, Xen would not do that. However, the loopback file-system that is involved in VBD''s would potentially do things that are different from the actual hardware. I think you should be able to mount the virtual disk as a "device" on your system. I don''t know of the top of my head how to do that, but essentially something like this: mount myimage.hdd loop/ -t ext3 [additional parameters may be needed]. You could then do "chroot loop/", and perform your tests there. This should execute the same thing from the same place on the native linux as you would in DomU. Now, this may not run faster on native than your original setup, but I wouldn''t be surprised if it does... -- Mats> > > Now, I''m not saying that there isn''t a possibility that > something is > > managed differently in Xen that makes this run faster - I > just don''t > > really see how that would be likely, since everything that > happens in > > the system is going to be MORE complicated by the extra > layer of Xen > > involved. > > > If anyone else has some thoughts on this subject, it would be > > interesting to hear. > > I agree. But given the VM having same hardware/software > configuration as the physical machine, it runs faster still > looks abnormal to me. I wonder if there is any other more > efficient debugging strategies I can use to investigate it. I > appreciate if any one has any more suggestions. > > Thanks again. > > Xuehai > > > > >>-----Original Message----- > >>From: xen-devel-bounces@lists.xensource.com > >>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of xuehai > >>zhang > >>Sent: 23 November 2005 20:26 > >>To: xen-devel@lists.xensource.com > >>Cc: Tim Freeman; Kate Keahey > >>Subject: [Xen-devel] a question about popen() performance on domU > >> > >>Dear all, > >>When I compared the performance of some application on both > a Xen domU > >>and a standard linux machine (where domU runs on a similar physical > >>mahine), I notice the application runs faster on the domU > than on the > >>physical machine. > >>Instrumenting the application code shows the application > spends more > >>time on popen() calls on domU than on the physical machine. > I wonder > >>if xenlinux does some special modification of the popen code to > >>improve its performance than the original Linux popen code? > >>Thanks in advance for your help. > >>Xuehai > >> > >>_______________________________________________ > >>Xen-devel mailing list > >>Xen-devel@lists.xensource.com > >>http://lists.xensource.com/xen-devel > >> > >> > > > > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-24 15:40 UTC
Re: [Xen-devel] a question about popen() performance on domU
> See comments below.Thanks Mats. I have more questions about your comments below. Xuehai>>-----Original Message----- >>From: xuehai zhang [mailto:hai@cs.uchicago.edu] >>Sent: 24 November 2005 14:02 >>To: Petersson, Mats >>Cc: xen-devel@lists.xensource.com; Tim Freeman; Kate Keahey >>Subject: Re: [Xen-devel] a question about popen() performance on domU >> >>Mats, >> >>Thanks a lot for the response. >> >> >>>I did have a look at popen, and essentially, it does the >> >>following [ >> >>>the real code is MUCH more complicated, doing lots of >> >>open/dup/close >> >>>on pipes and stuff]: >>>if (!fork()) >>> exec("/bin/sh", "sh", "-c", cmd, NULL); >> >>I took a look at the popen source code too yesterday and the >>above lines are the esstential part. A thread at gnu list >>(http://lists.gnu.org/archive/html/bug-global/2005-06/msg00001 >>.html) suggets >>popen() might depend on how fast /bin/sh is executed. On both >>my VM and the physical machine, the kernel version is 2.6.11, >>glibc version is 2.3.2.ds1-21, and /bin/sh is linked to >>/bin/bash. I also tried to see any difference of the shared >>libraries used by /bin/sh on both machines and found /bin/sh >>on the physical machine uses libraries from /lib/tls while >>for the VM this directory is disabled. >> >>VM$ ldd /bin/sh >> libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa7000) >> libdl.so.2 => /lib/libdl.so.2 (0xb7fa3000) >> libc.so.6 => /lib/libc.so.6 (0xb7e70000) >> /lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000) >> >>PHYSICAL$ ldd /bin/sh >> libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa6000) >> libdl.so.2 => /lib/tls/libdl.so.2 (0xb7fa2000) >> libc.so.6 => /lib/tls/libc.so.6 (0xb7e6d000) >> /lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000) > > > In this particular case, I would think that lib/tls is not a factor, but > it may be worth disabling the tls libraries on the pysical machine too, > just to make sure... [just "mv /lib/tls /lib/tls.disabled" should do > it].I don''t think /lib/tls is the factor too. I did rerun the tests with tls disabled on the physical machine and it gave even worse performance for the tests. So, I switched it back.> >>>The fork creates another process, which then executes the /bin/sh, >>>which again causes another fork/exec to take place in the effort of >>>executing the actual command given. >>> >>>So the major component of popen would be fork() and >> >>execl(), both of >> >>>which cause, amongst other things, a lot of page-table work and >>>task-switching. >>> >>>Note that popen is implemented in glibc [I took the 2.3.6 >> >>source code >> >>>from www.gnu.org for my look at this], so there''s no >> >>difference in the >> >>>implementation of popen itself - the difference lies in how >> >>the Linux >> >>>kernel handles fork() and exec(), but maybe more importantly, how >>>task-switches and page-tables are handled in Linux native >> >>and Xen-Linux. >> >>>Because Xen keeps track of the page-tables on top of >> >>Linux''s handling >> >>>of page-tables, you get some extra work here. So, it should >> >>really be >> >>>slower on Xen than on native Linux. >>>[In fact, the question came up not so long ago, why Xen was SLOWER >>>than native Linux on popen (and some others) in a particular >>>benchmark, and the result of that investigation was that >> >>it''s down to, >> >>>mainly, task-switching takes longer in Xen.] >> >>I agree with your explanation about Xen was SLOWER than >>native Linux on popen because of the longer task-switching in >>Xen. The problem I met (popen runs faster on Xen VM than the >>physical machine) looks abnormal. I ran several home-made >>benchmarking programming and used the "strace" tool to trace >>the system call performance. The first program is to test the >>performance of both popen and pclose (a loop of popen call >>with a followup pclose call) and the source of the program >>and the strace results are available at >>http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-popen/st >>race.txt. The results shows the waitpid syscall costs more >>time on physical machine than on the VM (see the usecs/call >>valuee in the following table). >> >> % time seconds usecs/call calls >>errors syscall >> ------ ----------- ----------- --------- >>--------- ---------------- >>VM: 63.43 0.127900 6395 20 >> waitpid >>PHYSICAL >>MACHINE: 93.87 0.532498 26625 20 >> waitpid >> >>waitpid is called by pclose as shown in the glibc source >>code. So, my original post questioning the performance of >>popen should take pclose into consideration too. A more >>accurate question I should post is, popen+pclose executes >>faster on my VM than my physical machine. The popen/pclose >>benchmark I did narrows the problem down to waitpid that >>waitpid somehow is suffering on the physical machine. >>So, I did a followup experiment to test the fork and waitpid >>performance on both machines. The program is a loop of fork >>call with a followup waitpid call. The source of the program >>and the strace results are available at >>http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-fork/str >>ace.txt. The strace results confirm the waitpid costs more >>time on the physical machine (154 usec/call) than the VM (56 >>usec/call). >>However, the program runs faster on the physical machine (not >>like the popen/pclose program) and the results suggest the >>fork syscall used on the VM costs more time than the clone >>syscall on the physical machine. I have a question here, why >>the physical machine doesn''t use fork syscall but the clone >>syscall for the same program? > > > Because it''s using the same source for glibc! glibc says to use > _IO_fork(), which is calling the fork syscall. Clone would probably do > the same thing, but for whatever good or bad reason, the author(s) of > thise code chose to use fork. There may be good reasons, or no reason at > all to do it this way. I couldn''t say. I don''t think it makes a whole > lot of difference if the actual command executed by popen is actually > "doing something", rather than just an empty "return".Do you have any suggestion why the same code uses different syscalls on two machines which have the same kernel and glibc?>>>The reason it is not would probably have something to do with the >>>differences in hardware on Linux vs. Xen platforms, perhaps >> >>the fact >> >>>that your file-system is a virtual block-device and thus >> >>lives inside >> >>>a file that is perhaps better cached or otherwise handled in a >>>different way on the Xen-system. >> >>Let me describe the hardware context of my VM and physical >>machine. The host of my VM and the physical machine I tested >>against the VM, are two nodes of a physical cluster with the >>same hardware configuration (Dual Intel PIII 498.799 MHz CPU, >>512MB memory, a 4GB HD with same partitions). The physical >>machine is rebooted with "nosmp". The VM host is rebooted >>into Xen with "nosmp" (Xen version information is "Latest >>ChangeSet: 2005/05/03 17:30:40 1.1846 >>4277a730mvnFSFXrxJpVRNk8hjD4Vg"). Xen dom0 is assigned 96MB >>memory and the VM is the only user domain running on the VM >>host with 395MB memory. Both dom0 and the VM are pinned to CPU 0. >> >>Yes, the backends of the VM''s VBDs are loopback files in >>dom0. Three loopback files are used to map to three >>partitions inside of the VM. I acutally thought about the >>possible caching effect of the VM''s VBD backends, but not >>sure how to testify it and compare it with the physical >>machine. Is it possible the Xen has different assurance of >>writing back than the physical machine, that is, the data is >>kept in memory longer before is actually written to disk? > > > Xen itself doesn''t know ANYTHING about the disk/file where the data for > the Dom0 or DomU comes from, so no, Xen would not do that. However, the > loopback file-system that is involved in VBD''s would potentially do > things that are different from the actual hardware.So, there is possbility that the loopback file-system can do something tricky like caching and results in better performance for applications running inside of the VM?> I think you should be able to mount the virtual disk as a "device" on > your system.What does "your system" here refer to? Does it mean dom0 or inside of domU?> I don''t know of the top of my head how to do that, but > essentially something like this: > mount myimage.hdd loop/ -t ext3 [additional parameters may be needed]. > > You could then do "chroot loop/", and perform your tests there. This > should execute the same thing from the same place on the native linux as > you would in DomU. > > Now, this may not run faster on native than your original setup, but I > wouldn''t be surprised if it does...This is interesting. I will try to run the same tests if I canmount the virtual disk as "device" successfully. Thanks. Xuehai> >>>Now, I''m not saying that there isn''t a possibility that >> >>something is >> >>>managed differently in Xen that makes this run faster - I >> >>just don''t >> >>>really see how that would be likely, since everything that >> >>happens in >> >>>the system is going to be MORE complicated by the extra >> >>layer of Xen >> >>>involved. >> >>>If anyone else has some thoughts on this subject, it would be >>>interesting to hear. >> >>I agree. But given the VM having same hardware/software >>configuration as the physical machine, it runs faster still >>looks abnormal to me. I wonder if there is any other more >>efficient debugging strategies I can use to investigate it. I >>appreciate if any one has any more suggestions. >> >>Thanks again. >> >>Xuehai >> >> >>>>-----Original Message----- >>>>From: xen-devel-bounces@lists.xensource.com >>>>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of xuehai >>>>zhang >>>>Sent: 23 November 2005 20:26 >>>>To: xen-devel@lists.xensource.com >>>>Cc: Tim Freeman; Kate Keahey >>>>Subject: [Xen-devel] a question about popen() performance on domU >>>> >>>>Dear all, >>>>When I compared the performance of some application on both >> >>a Xen domU >> >>>>and a standard linux machine (where domU runs on a similar physical >>>>mahine), I notice the application runs faster on the domU >> >>than on the >> >>>>physical machine. >>>>Instrumenting the application code shows the application >> >>spends more >> >>>>time on popen() calls on domU than on the physical machine. >> >>I wonder >> >>>>if xenlinux does some special modification of the popen code to >>>>improve its performance than the original Linux popen code? >>>>Thanks in advance for your help. >>>>Xuehai >>>> >>>>_______________________________________________ >>>>Xen-devel mailing list >>>>Xen-devel@lists.xensource.com >>>>http://lists.xensource.com/xen-devel >>>> >>>> >>> >>> >>> >> >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2005-Nov-24 15:51 UTC
RE: [Xen-devel] a question about popen() performance on domU
> -----Original Message----- > From: xuehai zhang [mailto:hai@cs.uchicago.edu] > Sent: 24 November 2005 15:41 > To: Petersson, Mats > Cc: xen-devel@lists.xensource.com; Tim Freeman; Kate Keahey > Subject: Re: [Xen-devel] a question about popen() performance on domU > > > See comments below. > > Thanks Mats. I have more questions about your comments below. > > Xuehai > > >>-----Original Message----- > >>From: xuehai zhang [mailto:hai@cs.uchicago.edu] > >>Sent: 24 November 2005 14:02 > >>To: Petersson, Mats > >>Cc: xen-devel@lists.xensource.com; Tim Freeman; Kate Keahey > >>Subject: Re: [Xen-devel] a question about popen() > performance on domU > >> > >>Mats, > >> > >>Thanks a lot for the response. > >> > >> > >>>I did have a look at popen, and essentially, it does the > >> > >>following [ > >> > >>>the real code is MUCH more complicated, doing lots of > >> > >>open/dup/close > >> > >>>on pipes and stuff]: > >>>if (!fork()) > >>> exec("/bin/sh", "sh", "-c", cmd, NULL); > >> > >>I took a look at the popen source code too yesterday and the above > >>lines are the esstential part. A thread at gnu list > >>(http://lists.gnu.org/archive/html/bug-global/2005-06/msg00001 > >>.html) suggets > >>popen() might depend on how fast /bin/sh is executed. On both my VM > >>and the physical machine, the kernel version is 2.6.11, > glibc version > >>is 2.3.2.ds1-21, and /bin/sh is linked to /bin/bash. I also > tried to > >>see any difference of the shared libraries used by /bin/sh on both > >>machines and found /bin/sh on the physical machine uses > libraries from > >>/lib/tls while for the VM this directory is disabled. > >> > >>VM$ ldd /bin/sh > >> libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa7000) > >> libdl.so.2 => /lib/libdl.so.2 (0xb7fa3000) > >> libc.so.6 => /lib/libc.so.6 (0xb7e70000) > >> /lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000) > >> > >>PHYSICAL$ ldd /bin/sh > >> libncurses.so.5 => /lib/libncurses.so.5 (0xb7fa6000) > >> libdl.so.2 => /lib/tls/libdl.so.2 (0xb7fa2000) > >> libc.so.6 => /lib/tls/libc.so.6 (0xb7e6d000) > >> /lib/ld-.so.2 => /lib/ld-.so.2 (0xb7fea000) > > > > > > In this particular case, I would think that lib/tls is not > a factor, > > but it may be worth disabling the tls libraries on the > pysical machine > > too, just to make sure... [just "mv /lib/tls > /lib/tls.disabled" should > > do it]. > > I don''t think /lib/tls is the factor too. I did rerun the > tests with tls disabled on the physical machine and it gave > even worse performance for the tests. So, I switched it back. > > > > >>>The fork creates another process, which then executes the /bin/sh, > >>>which again causes another fork/exec to take place in the > effort of > >>>executing the actual command given. > >>> > >>>So the major component of popen would be fork() and > >> > >>execl(), both of > >> > >>>which cause, amongst other things, a lot of page-table work and > >>>task-switching. > >>> > >>>Note that popen is implemented in glibc [I took the 2.3.6 > >> > >>source code > >> > >>>from www.gnu.org for my look at this], so there''s no > >> > >>difference in the > >> > >>>implementation of popen itself - the difference lies in how > >> > >>the Linux > >> > >>>kernel handles fork() and exec(), but maybe more importantly, how > >>>task-switches and page-tables are handled in Linux native > >> > >>and Xen-Linux. > >> > >>>Because Xen keeps track of the page-tables on top of > >> > >>Linux''s handling > >> > >>>of page-tables, you get some extra work here. So, it should > >> > >>really be > >> > >>>slower on Xen than on native Linux. > >>>[In fact, the question came up not so long ago, why Xen was SLOWER > >>>than native Linux on popen (and some others) in a particular > >>>benchmark, and the result of that investigation was that > >> > >>it''s down to, > >> > >>>mainly, task-switching takes longer in Xen.] > >> > >>I agree with your explanation about Xen was SLOWER than > native Linux > >>on popen because of the longer task-switching in Xen. The problem I > >>met (popen runs faster on Xen VM than the physical machine) looks > >>abnormal. I ran several home-made benchmarking programming and used > >>the "strace" tool to trace the system call performance. The first > >>program is to test the performance of both popen and pclose > (a loop of > >>popen call with a followup pclose call) and the source of > the program > >>and the strace results are available at > >>http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-popen/st > >>race.txt. The results shows the waitpid syscall costs more time on > >>physical machine than on the VM (see the usecs/call valuee in the > >>following table). > >> > >> % time seconds usecs/call calls > >>errors syscall > >> ------ ----------- ----------- --------- > >>--------- ---------------- > >>VM: 63.43 0.127900 6395 20 > >> waitpid > >>PHYSICAL > >>MACHINE: 93.87 0.532498 26625 20 > >> waitpid > >> > >>waitpid is called by pclose as shown in the glibc source > code. So, my > >>original post questioning the performance of popen should > take pclose > >>into consideration too. A more accurate question I should post is, > >>popen+pclose executes faster on my VM than my physical machine. The > >>popen/pclose benchmark I did narrows the problem down to > waitpid that > >>waitpid somehow is suffering on the physical machine. > >>So, I did a followup experiment to test the fork and waitpid > >>performance on both machines. The program is a loop of fork > call with > >>a followup waitpid call. The source of the program and the strace > >>results are available at > >>http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-fork/str > >>ace.txt. The strace results confirm the waitpid costs more > time on the > >>physical machine (154 usec/call) than the VM (56 usec/call). > >>However, the program runs faster on the physical machine > (not like the > >>popen/pclose program) and the results suggest the fork > syscall used on > >>the VM costs more time than the clone syscall on the > physical machine. > >>I have a question here, why the physical machine doesn''t use fork > >>syscall but the clone syscall for the same program? > > > > > > Because it''s using the same source for glibc! glibc says to use > > _IO_fork(), which is calling the fork syscall. Clone would > probably do > > the same thing, but for whatever good or bad reason, the > author(s) of > > thise code chose to use fork. There may be good reasons, or > no reason > > at all to do it this way. I couldn''t say. I don''t think it makes a > > whole lot of difference if the actual command executed by popen is > > actually "doing something", rather than just an empty "return". > > Do you have any suggestion why the same code uses different > syscalls on two machines which have the same kernel and glibc?That I can''t explain. I guess one possibility is that in some way, the fork() call gets translated to clone() at some other level. I did a grep for _IO_fork in the source for glibc, and it comes back as #define _IO_fork fork .> > >>>The reason it is not would probably have something to do with the > >>>differences in hardware on Linux vs. Xen platforms, perhaps > >> > >>the fact > >> > >>>that your file-system is a virtual block-device and thus > >> > >>lives inside > >> > >>>a file that is perhaps better cached or otherwise handled in a > >>>different way on the Xen-system. > >> > >>Let me describe the hardware context of my VM and physical machine. > >>The host of my VM and the physical machine I tested against the VM, > >>are two nodes of a physical cluster with the same hardware > >>configuration (Dual Intel PIII 498.799 MHz CPU, 512MB > memory, a 4GB HD > >>with same partitions). The physical machine is rebooted > with "nosmp". > >>The VM host is rebooted into Xen with "nosmp" (Xen version > >>information is "Latest > >>ChangeSet: 2005/05/03 17:30:40 1.1846 > >>4277a730mvnFSFXrxJpVRNk8hjD4Vg"). Xen dom0 is assigned 96MB > memory and > >>the VM is the only user domain running on the VM host with 395MB > >>memory. Both dom0 and the VM are pinned to CPU 0. > >> > >>Yes, the backends of the VM''s VBDs are loopback files in > dom0. Three > >>loopback files are used to map to three partitions inside > of the VM. I > >>acutally thought about the possible caching effect of the VM''s VBD > >>backends, but not sure how to testify it and compare it with the > >>physical machine. Is it possible the Xen has different assurance of > >>writing back than the physical machine, that is, the data > is kept in > >>memory longer before is actually written to disk? > > > > > > Xen itself doesn''t know ANYTHING about the disk/file where > the data for > > the Dom0 or DomU comes from, so no, Xen would not do that. > However, the > > loopback file-system that is involved in VBD''s would potentially do > > things that are different from the actual hardware. > > So, there is possbility that the loopback file-system can do > something tricky like caching and > results in better performance for applications running inside > of the VM? > > > I think you should be able to mount the virtual disk as a > "device" on > > your system. > > What does "your system" here refer to? Does it mean dom0 or > inside of domU?Your system here referes to "PHYSICAL".> > > I don''t know of the top of my head how to do that, but > > essentially something like this: > > mount myimage.hdd loop/ -t ext3 [additional parameters may > be needed]. > > > > You could then do "chroot loop/", and perform your tests there. This > > should execute the same thing from the same place on the > native linux as > > you would in DomU. > > > > Now, this may not run faster on native than your original > setup, but I > > wouldn''t be surprised if it does... > > This is interesting. I will try to run the same tests if I > canmount the virtual disk as "device" > successfully.Please share the results... ;-)> > Thanks. > > Xuehai > > > >>>Now, I''m not saying that there isn''t a possibility that > >> > >>something is > >> > >>>managed differently in Xen that makes this run faster - I > >> > >>just don''t > >> > >>>really see how that would be likely, since everything that > >> > >>happens in > >> > >>>the system is going to be MORE complicated by the extra > >> > >>layer of Xen > >> > >>>involved. > >> > >>>If anyone else has some thoughts on this subject, it would be > >>>interesting to hear. > >> > >>I agree. But given the VM having same hardware/software > >>configuration as the physical machine, it runs faster still > >>looks abnormal to me. I wonder if there is any other more > >>efficient debugging strategies I can use to investigate it. I > >>appreciate if any one has any more suggestions. > >> > >>Thanks again. > >> > >>Xuehai > >> > >> > >>>>-----Original Message----- > >>>>From: xen-devel-bounces@lists.xensource.com > >>>>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf > Of xuehai > >>>>zhang > >>>>Sent: 23 November 2005 20:26 > >>>>To: xen-devel@lists.xensource.com > >>>>Cc: Tim Freeman; Kate Keahey > >>>>Subject: [Xen-devel] a question about popen() performance on domU > >>>> > >>>>Dear all, > >>>>When I compared the performance of some application on both > >> > >>a Xen domU > >> > >>>>and a standard linux machine (where domU runs on a > similar physical > >>>>mahine), I notice the application runs faster on the domU > >> > >>than on the > >> > >>>>physical machine. > >>>>Instrumenting the application code shows the application > >> > >>spends more > >> > >>>>time on popen() calls on domU than on the physical machine. > >> > >>I wonder > >> > >>>>if xenlinux does some special modification of the popen code to > >>>>improve its performance than the original Linux popen code? > >>>>Thanks in advance for your help. > >>>>Xuehai > >>>> > >>>>_______________________________________________ > >>>>Xen-devel mailing list > >>>>Xen-devel@lists.xensource.com > >>>>http://lists.xensource.com/xen-devel > >>>> > >>>> > >>> > >>> > >>> > >> > >> > > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xuehai zhang
2005-Nov-26 00:37 UTC
Re: [Xen-devel] a question about popen() performance on domU
>>waitpid is called by pclose as shown in the glibc source >>code. So, my original post questioning the performance of >>popen should take pclose into consideration too. A more >>accurate question I should post is, popen+pclose executes >>faster on my VM than my physical machine. The popen/pclose >>benchmark I did narrows the problem down to waitpid that >>waitpid somehow is suffering on the physical machine. >>So, I did a followup experiment to test the fork and waitpid >>performance on both machines. The program is a loop of fork >>call with a followup waitpid call. The source of the program >>and the strace results are available at >>http://people.cs.uchicago.edu/~hai/tmp/gt2gram/strace-fork/str >>ace.txt. The strace results confirm the waitpid costs more >>time on the physical machine (154 usec/call) than the VM (56 >>usec/call). >>However, the program runs faster on the physical machine (not >>like the popen/pclose program) and the results suggest the >>fork syscall used on the VM costs more time than the clone >>syscall on the physical machine. I have a question here, why >>the physical machine doesn''t use fork syscall but the clone >>syscall for the same program? > > > Because it''s using the same source for glibc! glibc says to use > _IO_fork(), which is calling the fork syscall. Clone would probably do > the same thing, but for whatever good or bad reason, the author(s) of > thise code chose to use fork. There may be good reasons, or no reason at > all to do it this way. I couldn''t say. I don''t think it makes a whole > lot of difference if the actual command executed by popen is actually > "doing something", rather than just an empty "return".Mats, I am not very sure about your comment in the last sentence. Are you suggesting the command passed to popen should have no big effect on popen''s performance? Thanks. Xuehai _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, I am attempting to install /tools/vnet. I''ve attempted it from both stable and unstable versions of Xen and get the following errors. Here is what I did 1. Followed the directions in /tools/vnet/00INSTALL. Ran ''make install'' and got errors. 2. I modified ''Makefile'' which got rid of the errors. I ran ''make clean'' and ''make install'' and then got these errors (which are different from errors in step #1).>>>>>>>>>>>>>>>>>>>>>In file included from vnetd.h:22, from vcache.c:39: ../vnet-module/if_varp.h:39: warning: declaration does not declare anything In file included from vcache.c:39: vnetd.h:40: warning: declaration does not declare anything vcache.c: In function `varp_send'': vcache.c:105: error: structure has no member named `id'' vcache.c:106: error: structure has no member named `opcode'' vcache.c: In function `vcache_forward_varp'': vcache.c:506: error: structure has no member named `opcode'' vcache.c: In function `vcache_handle_message'': vcache.c:614: error: structure has no member named `opcode'' make[1]: *** [vcache.o] Error 1 make[1]: Leaving directory `/usr/src/bk/xen/tools/vnet/vnetd'' make: *** [vnetd] Error 2>>>>>>>>>>>>>>>>>>>>>3. I tried the unstable version of Xen and get the following errors when running ''make install'' after recomiling the kernel with the required options.>>>>>>>>>>>>>>>>>>>>>In file included from vnetd.c:118: /home/rbhote/xen-2.0/tools/vnet/vnet-module/if_etherip.h:36:2: #error "Please fix <asm/byteorder.h>" In file included from vnetd.c:123: vnetd.h:41: warning: declaration does not declare anything make[1]: *** [vnetd.o] Error 1 make[1]: Leaving directory `/home/rbhote/xen-2.0/tools/vnet/vnetd'' make: *** [vnetd-all] Error 2>>>>>>>>>>>>>>>>>>>>>http://lists.xensource.com/archives/html/xen-devel/2005-02/msg00391.html This is a link that indicates that this is a problem that is not fixed. However the post is a year old. Any ideas would be appreciated. Thanks, Rustam _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Petersson, Mats
2005-Nov-28 10:01 UTC
RE: [Xen-devel] a question about popen() performance on domU
> > Because it''s using the same source for glibc! glibc says to use > > _IO_fork(), which is calling the fork syscall. Clone would > probably do > > the same thing, but for whatever good or bad reason, the > author(s) of > > thise code chose to use fork. There may be good reasons, or > no reason > > at all to do it this way. I couldn''t say. I don''t think it makes a > > whole lot of difference if the actual command executed by popen is > > actually "doing something", rather than just an empty "return". > > Mats, > I am not very sure about your comment in the last sentence. > Are you suggesting the command passed to popen should have no > big effect on popen''s performance?No, my point was that clone and fork are very similar, and if you ACTUALLY do something in the forked/cloned process, the difference betweent the two process creating mechanisms would be very similar - however, if you don''t do anything inside the popen, you get to see the difference. The thing is that popen is meant to spawn a process that actually does something - at least that''s the general idea. That is why this sort of microbenchmark that tests a paricular system call is often quite useless - if you where to spawn of a gcc compile, do you think the time taken to actually perform popen would be noticable, compared to the compiling of some kilobytes of source-code [if you use -pipe on gcc I believe it would use popen to create the next level compile]? Application benchmarks are much more meaningfull for the end users and with the right tools can be used to direct the kernel developers to look at the right areas of the kernel to increase the performance of the system - which is probably not going to be suboptimisations inside popen. -- Mats> Thanks. > Xuehai > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel