I re-post this from Xen-User maillist. I found that xen PV linux performance is very poor comparing with native linux or HVMPV linux in some case such as system call (which will cause context switch). Here is a very simple sample: double geTime() { struct timeval t; gettimeofday(&t, 0); return (double) t.tv_sec + (double) t.tv_usec / 1000000.0; } int geInc(int sum) { return sum+1; } int main() { for (i=0; i<; i++) { geTime(); } In PV linux guest, It will be 10 times slower than PVHVM linux guest. While call getInc() 10000000 times, PV guest is a little faster then HVMPV. So it seems that PV linux guest has poor performance in context switch case. How can I tune this or if there''s any plan fixing this issue? I''m looking forward to get your feedback. Thank you. --------------------------------------------------------------------------- XEN 4.1.2 + Dom 0 kernel 3.2 + Ubuntu 12.04 guest, Intel(R) Xeon(R) CPU E5620 Jacky _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Sat, Jun 30, 2012 at 10:54:34PM +0800, Zhou Jacky wrote:> I re-post this from Xen-User maillist. >Hello,> I found that xen PV linux performance is very poor comparing with native > linux or HVMPV linux in some case such as system call (which will cause > context switch). > Here is a very simple sample: > double geTime() { > struct timeval t; > gettimeofday(&t, 0); > return (double) t.tv_sec + (double) t.tv_usec / 1000000.0; > } > int geInc(int sum) { > return sum+1; > } > int main() { > for (i=0; i<; i++) { > geTime(); > } > In PV linux guest, It will be 10 times slower than PVHVM linux guest. > While call getInc() 10000000 times, PV guest is a little faster then > HVMPV. > So it seems that PV linux guest has poor performance in context switch > case. > How can I tune this or if there''s any plan fixing this issue? > I''m looking forward to get your feedback. Thank you. >Is this a 64bit PV domU ? If yes, then try with a 32bit PAE PV kernel. 64bit PV has a performance hit per x86_64 (amd64) architecture design, there''s not enough ring levels for hypervisor/kernel/user split in the amd64 spec, so syscalls from usermode need to be trapped by the hypervisor. 32bit x86 does not have this limitation, ie. x86 has enough ring levels. If you need 64bit VM, then it''s better to use Xen PVHVM for this kind of workloads with a lot of syscalls.> --------------------------------------------------------------------------- > XEN 4.1.2 + Dom 0 kernel 3.2 + Ubuntu 12.04 guest, Intel(R) Xeon(R) CPU > E5620 > Jacky-- Pasi
At 22:54 +0800 on 30 Jun (1341096874), Zhou Jacky wrote:> I re-post this from Xen-User maillist. > > I found that xen PV linux performance is very poor comparing with native > linux or HVMPV linux in some case such as system call (which will cause > context switch). > Here is a very simple sample: > > double geTime() { > struct timeval t; > gettimeofday(&t, 0); > return (double) t.tv_sec + (double) t.tv_usec / 1000000.0; > } > > int geInc(int sum) { > return sum+1; > } > > int main() { > for (i=0; i<; i++) { > geTime(); > } > > In PV linux guest, It will be 10 times slower than PVHVM linux guest. > While call getInc() 10000000 times, PV guest is a little faster then > HVMPV.gettimeofday() is often a vsyscall on native/HVM linux (i.e. it doesn''t actually make a system call), and I''m not sure that''s the case on PV. You could try coding up an actual system call (in assembly), or using a libc wrapper that always makes a system call (I think getppid() is a good choice but you should use strace to confirm it''s making system calls).> So it seems that PV linux guest has poor performance in context switch case.By context switch people usually mean changing from one process to another, which is not what''s happening here.> How can I tune this or if there''s any plan fixing this issue?That depends -- what''s your actual goal? Do you actually care about gettimeofday() performance or is there some other workload that''s running slowly for you? Tim.
Thank for your quickly response. Yes, it''s a 64bit PV domU and also dom0 is 64bit. So in this case can 32bit PAE PV domU get better performance since the hardware is x86_64 architecture?(there''s not enough ring levels for hypervisor/kernel/user) Need i change dom0 OS from 64bit to 32 bit? Other question, in PAE PV kernel PSE need be enable? (I mean PAE 2MB page, I find PSE always be clear by XEN code) 2012/7/1 Pasi Kärkkäinen <pasik@iki.fi>> On Sat, Jun 30, 2012 at 10:54:34PM +0800, Zhou Jacky wrote: > > I re-post this from Xen-User maillist. > > > > Hello, > > > I found that xen PV linux performance is very poor comparing with > native > > linux or HVMPV linux in some case such as system call (which will > cause > > context switch). > > Here is a very simple sample: > > double geTime() { > > struct timeval t; > > gettimeofday(&t, 0); > > return (double) t.tv_sec + (double) t.tv_usec / 1000000.0; > > } > > int geInc(int sum) { > > return sum+1; > > } > > int main() { > > for (i=0; i<; i++) { > > geTime(); > > } > > In PV linux guest, It will be 10 times slower than PVHVM linux guest. > > While call getInc() 10000000 times, PV guest is a little faster then > > HVMPV. > > So it seems that PV linux guest has poor performance in context switch > > case. > > How can I tune this or if there''s any plan fixing this issue? > > I''m looking forward to get your feedback. Thank you. > > > > Is this a 64bit PV domU ? If yes, then try with a 32bit PAE PV kernel. > > 64bit PV has a performance hit per x86_64 (amd64) architecture design, > there''s not enough ring levels for hypervisor/kernel/user split in the > amd64 spec, > so syscalls from usermode need to be trapped by the hypervisor. > > 32bit x86 does not have this limitation, ie. x86 has enough ring levels. > > If you need 64bit VM, then it''s better to use Xen PVHVM for this kind of > workloads > with a lot of syscalls. > > > > --------------------------------------------------------------------------- > > XEN 4.1.2 + Dom 0 kernel 3.2 + Ubuntu 12.04 guest, Intel(R) Xeon(R) > CPU > > E5620 > > Jacky > > > -- Pasi > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Hello, I see what''s the gettimeofday behavior depends on the glic config macro & compile. I just demonstrate the issue that 64bit PV DomU performance is very poor using gettimeofday . In fact, the performance is poor when call read,write, fork,exec and almost all system calls. A few days ago someone report related issue to xen-devel that fork system call has very poor performance. The link as: http://lists.xen.org/archives/html/xen-devel/2012-06/msg01010.html 2012/7/1 Tim Deegan <tim@xen.org>> At 22:54 +0800 on 30 Jun (1341096874), Zhou Jacky wrote: > > I re-post this from Xen-User maillist. > > > > I found that xen PV linux performance is very poor comparing with native > > linux or HVMPV linux in some case such as system call (which will cause > > context switch). > > Here is a very simple sample: > > > > double geTime() { > > struct timeval t; > > gettimeofday(&t, 0); > > return (double) t.tv_sec + (double) t.tv_usec / 1000000.0; > > } > > > > int geInc(int sum) { > > return sum+1; > > } > > > > int main() { > > for (i=0; i<; i++) { > > geTime(); > > } > > > > In PV linux guest, It will be 10 times slower than PVHVM linux guest. > > While call getInc() 10000000 times, PV guest is a little faster then > > HVMPV. > > gettimeofday() is often a vsyscall on native/HVM linux (i.e. it doesn''t > actually make a system call), and I''m not sure that''s the case on PV. > You could try coding up an actual system call (in assembly), or using a > libc wrapper that always makes a system call (I think getppid() is a > good choice but you should use strace to confirm it''s making system > calls). > > > So it seems that PV linux guest has poor performance in context switch > case. > > By context switch people usually mean changing from one process to > another, which is not what''s happening here. > > > How can I tune this or if there''s any plan fixing this issue? > > That depends -- what''s your actual goal? Do you actually care about > gettimeofday() performance or is there some other workload that''s > running slowly for you? > > Tim. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
At 22:26 +0800 on 01 Jul (1341181569), Zhou Jacky wrote:> Hello, > > I see what''s the gettimeofday behavior depends on the glic config macro & > compile. > I just demonstrate the issue that 64bit PV DomU performance is very poor > using gettimeofday . > In fact, the performance is poor when call read,write, fork,exec and almost > all system calls.Yes, system calls from 64-bit PV guests are expensive. If your guest doesn''t have a lot of memory (less than about 4GB) it''s probably faster to use a 32-bit kernel, which doesn''t have this problem. If it has a lot of memory, the cost of extra pagetable manipulations may mean that the 32-bit kernel is actually slower, though. If you have modern hardware with NPT/EPT, it may be faster to run your guests in HVM mode with PV drivers (either 32-bit or 64-bit). As always, it really depends on your workload, and micro-benchamrks might not be a good predictor of full-system performance.> A few days ago someone report related issue to xen-devel that fork system > call has very poor performance.AIUI that''s a particular issue with fork() on PV kernels because of how the pagetables are updated; I''ll leave it to more linuxy people to go into more detail. Tim.> The link as: > http://lists.xen.org/archives/html/xen-devel/2012-06/msg01010.html > > 2012/7/1 Tim Deegan <tim@xen.org> > > > At 22:54 +0800 on 30 Jun (1341096874), Zhou Jacky wrote: > > > I re-post this from Xen-User maillist. > > > > > > I found that xen PV linux performance is very poor comparing with native > > > linux or HVMPV linux in some case such as system call (which will cause > > > context switch). > > > Here is a very simple sample: > > > > > > double geTime() { > > > struct timeval t; > > > gettimeofday(&t, 0); > > > return (double) t.tv_sec + (double) t.tv_usec / 1000000.0; > > > } > > > > > > int geInc(int sum) { > > > return sum+1; > > > } > > > > > > int main() { > > > for (i=0; i<; i++) { > > > geTime(); > > > } > > > > > > In PV linux guest, It will be 10 times slower than PVHVM linux guest. > > > While call getInc() 10000000 times, PV guest is a little faster then > > > HVMPV. > > > > gettimeofday() is often a vsyscall on native/HVM linux (i.e. it doesn''t > > actually make a system call), and I''m not sure that''s the case on PV. > > You could try coding up an actual system call (in assembly), or using a > > libc wrapper that always makes a system call (I think getppid() is a > > good choice but you should use strace to confirm it''s making system > > calls). > > > > > So it seems that PV linux guest has poor performance in context switch > > case. > > > > By context switch people usually mean changing from one process to > > another, which is not what''s happening here. > > > > > How can I tune this or if there''s any plan fixing this issue? > > > > That depends -- what''s your actual goal? Do you actually care about > > gettimeofday() performance or is there some other workload that''s > > running slowly for you? > > > > Tim. > >> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On Tue, Jul 3, 2012 at 11:37 AM, Tim Deegan <tim@xen.org> wrote:> At 22:26 +0800 on 01 Jul (1341181569), Zhou Jacky wrote: >> Hello, >> >> I see what''s the gettimeofday behavior depends on the glic config macro & >> compile. >> I just demonstrate the issue that 64bit PV DomU performance is very poor >> using gettimeofday . >> In fact, the performance is poor when call read,write, fork,exec and almost >> all system calls. > > Yes, system calls from 64-bit PV guests are expensive. If your guest > doesn''t have a lot of memory (less than about 4GB) it''s probably faster > to use a 32-bit kernel, which doesn''t have this problem. If it has a > lot of memory, the cost of extra pagetable manipulations may mean that > the 32-bit kernel is actually slower, though.IIRC, I think Stefano did some tests with kernel compile (a very system-call-heavy workload), and found that a 32-bit PV kernel outperformed 64-bit PV for less than 768MiB -- basically, the cost of extra TLB flushes due to HIGHMEM outweighed the extra cost of system calls. -George
On Tue, 3 Jul 2012, George Dunlap wrote:> On Tue, Jul 3, 2012 at 11:37 AM, Tim Deegan <tim@xen.org> wrote: > > At 22:26 +0800 on 01 Jul (1341181569), Zhou Jacky wrote: > >> Hello, > >> > >> I see what''s the gettimeofday behavior depends on the glic config macro & > >> compile. > >> I just demonstrate the issue that 64bit PV DomU performance is very poor > >> using gettimeofday . > >> In fact, the performance is poor when call read,write, fork,exec and almost > >> all system calls. > > > > Yes, system calls from 64-bit PV guests are expensive. If your guest > > doesn''t have a lot of memory (less than about 4GB) it''s probably faster > > to use a 32-bit kernel, which doesn''t have this problem. If it has a > > lot of memory, the cost of extra pagetable manipulations may mean that > > the 32-bit kernel is actually slower, though. > > IIRC, I think Stefano did some tests with kernel compile (a very > system-call-heavy workload), and found that a 32-bit PV kernel > outperformed 64-bit PV for less than 768MiB -- basically, the cost of > extra TLB flushes due to HIGHMEM outweighed the extra cost of system > calls.Actually I only tested 8GB guests and I found that 64 bit PV guests outperformed 32 bit PV guests (http://www.slideshare.net/xen_com_mgr/6-stefano-spvhvm, slide 15). The interesting result that I found is that in the pbzip2 test (slide 17) 32 bit PV guests issue 10 times as many TLB flushes as 64 bit PV guests.