I''m looking into performance issues with HVM domains. After running some micro benchmarks I see that network performance takes a big hit in HVM domains. For example, on my test machine the netperf benchmark shows that an HVM domain gets only 3% of the throughput of a paravirtualized domain. In an effort to track down where the time is spent I applied the patches for Xenoprof passive domain support. Here are the first 25 lines from Xenoprof. samples % app name symbol name 289967 3.6271 pxen1-syms vmx_asm_vmexit_handler 279554 3.4968 xen-unstable-syms hypercall 249264 3.1179 vmlinux-2.6.16-xen0-up do_select 246991 3.0895 vmlinux-2.6.16-xen0-up hypercall_page 225975 2.8266 vmlinux-2.6.16-xen0-up system_call 196799 2.4617 vmlinux-2.6.16-xen0-up schedule 150136 1.8780 pvmlinux1-syms pcnet32_wio_write_csr 138253 1.7293 pxen1-syms vmx_io_instruction 131584 1.6459 vmlinux-2.6.16-xen0-up __copy_from_user_ll 128703 1.6099 pxen1-syms vmx_vmexit_handler 111488 1.3945 vmlinux-2.6.16-xen0-up sys_times 91488 1.1444 vmlinux-2.6.16-xen0-up __switch_to 90813 1.1359 pvmlinux1-syms pcnet32_wio_read_csr 90768 1.1354 libc-2.3.5.so __GI_memcpy 86011 1.0759 vmlinux-2.6.16-xen0-up core_sys_select 85427 1.0686 xen-unstable-syms do_update_descriptor 79002 0.9882 vmlinux-2.6.16-xen0-up hypervisor_callback 75150 0.9400 pxen1-syms evtchn_set_pending 69434 0.8685 vmlinux-2.6.16-xen0-up get_page_from_freelist 67366 0.8426 xen-unstable-syms __copy_from_user_ll 67019 0.8383 vmlinux-2.6.16-xen0-up __copy_to_user_ll 65826 0.8234 xen-unstable-syms evtchn_set_pending 65719 0.8220 pxen1-syms hvm_wait_io 65706 0.8219 pxen1-syms get_s_time 64974 0.8127 pxen1-syms vmx_intr_assist The first few lines from the brief report: samples| %| ------------------ 3134473 39.2076 vmlinux-2.6.16-xen0-up 1831782 22.9129 pxen1-syms 1472974 18.4247 xen-unstable-syms 620624 7.7631 pvmlinux1-syms 490539 6.1359 qemu-dm 199750 2.4986 libc-2.3.5.so 100124 1.2524 libpthread-2.3.5.so 75757 0.9476 oprofiled dom0 and the HVM domain (domain ID 1) are each running a uniprocessor kernel. The vcpu for the HVM domain was pinned to the sibling hyperthread of the same core on which dom0 is running to reduce the latencies of memory access between cores and/or sockets. The vcpu in dom0 ran at about 83% utilized. The vcpu in the HVM domain ran at about 37.5% utilized. I don''t see any obvious problems in the Xenoprof output. Anyone with experience with the inner workings of HVM domains care to comment on what might be causing the network performance to suffer so much? Steve D. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Dobbelstein <steved@us.ibm.com> writes:> > I don''t see any obvious problems in the Xenoprof output. Anyone with > experience with the inner workings of HVM domains care to comment on what > might be causing the network performance to suffer so much?You''ll never get good throughput either from a NE2000 nor from a pcnet32 model. -Andi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andi Kleen wrote:> Steve Dobbelstein <steved@us.ibm.com> writes: > >> I don''t see any obvious problems in the Xenoprof output. Anyone with >> experience with the inner workings of HVM domains care to comment on what >> might be causing the network performance to suffer so much? >> > > You''ll never get good throughput either from a NE2000 nor from a pcnet32 > model. >That was a suspicion of mine, but I wasn''t sure. I wonder if qemu-dm could emulate another adapter (if so, which one would be best?) or do we just punt and go for para-virt drivers. -Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Friday 05 May 2006 20:56, Andrew Theurer wrote:> Andi Kleen wrote: > > Steve Dobbelstein <steved@us.ibm.com> writes: > > > >> I don''t see any obvious problems in the Xenoprof output. Anyone with > >> experience with the inner workings of HVM domains care to comment on what > >> might be causing the network performance to suffer so much? > >> > > > > You''ll never get good throughput either from a NE2000 nor from a pcnet32 > > model. > > > That was a suspicion of mine, but I wasn''t sure. I wonder if qemu-dm > could emulate another adapter (if so, which one would be best?)I looked some time ago at specs but all the promising adapters (widely used, good features, not too broken design) had very restrictive license terms on their specifications (if they were available) They tended to only allow using the specification to write drivers for the hardware.> or do we > just punt and go for para-virt drivers.That''s the easy and probably fastest way short term, but long term it''s a lot more work (you will need to write drivers for all the old and new guest OS you want to run to get good performance) Also it''s a logistical problem because you''ll need to distribute and setup all these drivers even if they are written. -Andi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Dobbelstein> or do we just punt and go for para-virt drivers.Andi Kleen wrote:> That''s the easy and probably fastest way short term, but long term > it''s a lot more work (you will need to write drivers for all the > old and new guest OS > you want to run to get good performance)Yes, but given a good back-end driver model, these should be relatively simple. Especially if a couple examples were BSD licensed to kick start the development of these drivers.> Also it''s a logistical problem because you''ll need to distribute > and setup > all these drivers even if they are written.I would think the performance benefit would greatly outweigh the setup issues. But, what do you mean by ''distribute and setup''? -- Randy Thelen Network Appliance _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Randy Thelen <rthelen@netapp.com> writes:> > That''s the easy and probably fastest way short term, but long term > > it''s a lot more work (you will need to write drivers for all the > > old and new guest OS > > you want to run to get good performance) > > Yes, but given a good back-end driver model, these should be > relatively simple. Especially if a couple examples were BSD licensed > to kick start the development of these drivers.Even if it''s simple each by itself it will be a logistical nightmare. Or do you know how to write driver for all the zillion versions of old x86 OS people might want to run under the hypervisor? Probably just getting development environment for all of these would be "interesting".> > Also it''s a logistical problem because you''ll need to distribute > > and setup > > all these drivers even if they are written. > > I would think the performance benefit would greatly outweigh the > setup issues.I would expect a optimized modern NIC model to be not that much worse than a pure para virtual driver.> But, what do you mean by ''distribute and setup''?If the OS don''t have a driver already you need to distribute the drivers in a form that they can be used during installation. -Andi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel