I''m looking into performance issues with HVM domains. After running
some
micro benchmarks I see that network performance takes a big hit in HVM
domains. For example, on my test machine the netperf benchmark shows that
an HVM domain gets only 3% of the throughput of a paravirtualized domain.
In an effort to track down where the time is spent I applied the patches
for Xenoprof passive domain support. Here are the first 25 lines from
Xenoprof.
samples % app name symbol name
289967 3.6271 pxen1-syms vmx_asm_vmexit_handler
279554 3.4968 xen-unstable-syms hypercall
249264 3.1179 vmlinux-2.6.16-xen0-up do_select
246991 3.0895 vmlinux-2.6.16-xen0-up hypercall_page
225975 2.8266 vmlinux-2.6.16-xen0-up system_call
196799 2.4617 vmlinux-2.6.16-xen0-up schedule
150136 1.8780 pvmlinux1-syms pcnet32_wio_write_csr
138253 1.7293 pxen1-syms vmx_io_instruction
131584 1.6459 vmlinux-2.6.16-xen0-up __copy_from_user_ll
128703 1.6099 pxen1-syms vmx_vmexit_handler
111488 1.3945 vmlinux-2.6.16-xen0-up sys_times
91488 1.1444 vmlinux-2.6.16-xen0-up __switch_to
90813 1.1359 pvmlinux1-syms pcnet32_wio_read_csr
90768 1.1354 libc-2.3.5.so __GI_memcpy
86011 1.0759 vmlinux-2.6.16-xen0-up core_sys_select
85427 1.0686 xen-unstable-syms do_update_descriptor
79002 0.9882 vmlinux-2.6.16-xen0-up hypervisor_callback
75150 0.9400 pxen1-syms evtchn_set_pending
69434 0.8685 vmlinux-2.6.16-xen0-up get_page_from_freelist
67366 0.8426 xen-unstable-syms __copy_from_user_ll
67019 0.8383 vmlinux-2.6.16-xen0-up __copy_to_user_ll
65826 0.8234 xen-unstable-syms evtchn_set_pending
65719 0.8220 pxen1-syms hvm_wait_io
65706 0.8219 pxen1-syms get_s_time
64974 0.8127 pxen1-syms vmx_intr_assist
The first few lines from the brief report:
samples| %|
------------------
3134473 39.2076 vmlinux-2.6.16-xen0-up
1831782 22.9129 pxen1-syms
1472974 18.4247 xen-unstable-syms
620624 7.7631 pvmlinux1-syms
490539 6.1359 qemu-dm
199750 2.4986 libc-2.3.5.so
100124 1.2524 libpthread-2.3.5.so
75757 0.9476 oprofiled
dom0 and the HVM domain (domain ID 1) are each running a uniprocessor
kernel. The vcpu for the HVM domain was pinned to the sibling hyperthread
of the same core on which dom0 is running to reduce the latencies of memory
access between cores and/or sockets.
The vcpu in dom0 ran at about 83% utilized. The vcpu in the HVM domain ran
at about 37.5% utilized.
I don''t see any obvious problems in the Xenoprof output. Anyone with
experience with the inner workings of HVM domains care to comment on what
might be causing the network performance to suffer so much?
Steve D.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Steve Dobbelstein <steved@us.ibm.com> writes:> > I don''t see any obvious problems in the Xenoprof output. Anyone with > experience with the inner workings of HVM domains care to comment on what > might be causing the network performance to suffer so much?You''ll never get good throughput either from a NE2000 nor from a pcnet32 model. -Andi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andi Kleen wrote:> Steve Dobbelstein <steved@us.ibm.com> writes: > >> I don''t see any obvious problems in the Xenoprof output. Anyone with >> experience with the inner workings of HVM domains care to comment on what >> might be causing the network performance to suffer so much? >> > > You''ll never get good throughput either from a NE2000 nor from a pcnet32 > model. >That was a suspicion of mine, but I wasn''t sure. I wonder if qemu-dm could emulate another adapter (if so, which one would be best?) or do we just punt and go for para-virt drivers. -Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Friday 05 May 2006 20:56, Andrew Theurer wrote:> Andi Kleen wrote: > > Steve Dobbelstein <steved@us.ibm.com> writes: > > > >> I don''t see any obvious problems in the Xenoprof output. Anyone with > >> experience with the inner workings of HVM domains care to comment on what > >> might be causing the network performance to suffer so much? > >> > > > > You''ll never get good throughput either from a NE2000 nor from a pcnet32 > > model. > > > That was a suspicion of mine, but I wasn''t sure. I wonder if qemu-dm > could emulate another adapter (if so, which one would be best?)I looked some time ago at specs but all the promising adapters (widely used, good features, not too broken design) had very restrictive license terms on their specifications (if they were available) They tended to only allow using the specification to write drivers for the hardware.> or do we > just punt and go for para-virt drivers.That''s the easy and probably fastest way short term, but long term it''s a lot more work (you will need to write drivers for all the old and new guest OS you want to run to get good performance) Also it''s a logistical problem because you''ll need to distribute and setup all these drivers even if they are written. -Andi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Dobbelstein> or do we just punt and go for para-virt drivers.Andi Kleen wrote:> That''s the easy and probably fastest way short term, but long term > it''s a lot more work (you will need to write drivers for all the > old and new guest OS > you want to run to get good performance)Yes, but given a good back-end driver model, these should be relatively simple. Especially if a couple examples were BSD licensed to kick start the development of these drivers.> Also it''s a logistical problem because you''ll need to distribute > and setup > all these drivers even if they are written.I would think the performance benefit would greatly outweigh the setup issues. But, what do you mean by ''distribute and setup''? -- Randy Thelen Network Appliance _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Randy Thelen <rthelen@netapp.com> writes:> > That''s the easy and probably fastest way short term, but long term > > it''s a lot more work (you will need to write drivers for all the > > old and new guest OS > > you want to run to get good performance) > > Yes, but given a good back-end driver model, these should be > relatively simple. Especially if a couple examples were BSD licensed > to kick start the development of these drivers.Even if it''s simple each by itself it will be a logistical nightmare. Or do you know how to write driver for all the zillion versions of old x86 OS people might want to run under the hypervisor? Probably just getting development environment for all of these would be "interesting".> > Also it''s a logistical problem because you''ll need to distribute > > and setup > > all these drivers even if they are written. > > I would think the performance benefit would greatly outweigh the > setup issues.I would expect a optimized modern NIC model to be not that much worse than a pure para virtual driver.> But, what do you mean by ''distribute and setup''?If the OS don''t have a driver already you need to distribute the drivers in a form that they can be used during installation. -Andi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel