We started to test a Linux dom0 up to 16-way (4 sockets, dual core, with HT) and we began to see some serious scaling issues, compared to scaling baremetal Linux to 16-way. We took some profiles and saw that functions in xen/arch/x86/mm.c were using disproportionately more CPU time as we scaled up the number of CPUs. Taking a quick look and those functions (for example, like update_va_mapping and do_mmu_update) it became kind of obvious that the locking probably does not scale. It appears we lock domain-wide on many of thee functions. This, IMO, causes a serious problem when running an SMP domain which happens to page fault a lot. So, I got to thinking, just how much protection do we really need in these functions? The OS should already provide quite a bit of protection to page table writes. Is Xen imposing an even more deliberate and possibly unnecessary protection here? So I made some changes to when we lock/unlock in most of these functions in mm.c (patch attached). Warning: I am pretty much making a shot in the dark here. I do not know this code nearly well enough to say this is the right thing to do. However, I can say without a doubt the changes make a significant change in performance: benchmark throughput increase with lock reduction SDET 19% reaim_shared 65% reaim_fserver 16% Below are per-function ratios of CPU time rev8830/rev8830-lock-reduction (derived from oprofile diffs) SDET: 9.84/1 restore_all_guest 1.45/1 mod_l1_entry 2.59/1 do_softirq 1.63/1 test_guest_events 1.09/1 syscall_enter 1.35/1 propagate_page_fault 1.18/1 process_guest_except 1.13/1 timer_softirq_action 1.04/1 alloc_page_type 1.05/1 revalidate_l1 1.08/1 do_set_segment_base 1.10/1 get_s_time 1.19/1 __context_switch 1.09/1 switch_to_kernel 1.11/1 FLT4 1.62/1 xen_l3_entry_update 1.27/1 xen_invlpg_mask reaim_shared: 1.43/1 do_update_va_mapping 1.44/1 do_page_fault 1.47/1 do_mmu_update 6.75/1 restore_all_guest 1.43/1 do_mmuext_op 1.37/1 sedf_do_schedule 1.20/1 mod_l1_entry 2.46/1 do_softirq 1.27/1 t_timer_fn 1.34/1 do_set_segment_base 1.20/1 timer_softirq_action 1.24/1 process_guest_except 1.12/1 timer_interrupt 1.14/1 evtchn_send reaim_fserver: 1.16/1 do_update_va_mapping 1.13/1 do_page_fault 8.41/1 restore_all_guest 1.17/1 do_mmu_update 1.56/1 mod_l1_entry 2.48/1 do_softirq 1.02/1 do_mmuext_op 1.14/1 sedf_do_schedule 1.12/1 t_timer_fn 1.23/1 do_set_segment_base 1.11/1 device_not_available 1.11/1 timer_softirq_action 1.13/1 process_guest_except 1.20/1 timer_interrupt 1.15/1 copy_from_user 1.11/1 propagate_page_fault Any comments greatly appreciated. -Andrew <signed-off-by: habanero@us.ibm.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Feb-16 08:54 UTC
Re: [Xen-devel] scaling problem with writable pagetables
On 15 Feb 2006, at 18:49, Andrew Theurer wrote:> Below are per-function ratios of CPU time > rev8830/rev8830-lock-reduction (derived from oprofile diffs) > > SDET: > > 9.84/1 restore_all_guestKind of odd, since that function contains no locking. Perhaps VCPUs are blocked so long that, by the time they get the CPU there is an event pending and the hypercall gets preempted? What do the perfctr numbers look like for #hypercalls and #exceptions? Also worth adding one to __hypercall_create_continuation as that''ll count #preemptions. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Theurer
2006-Feb-16 13:41 UTC
Re: [Xen-devel] scaling problem with writable pagetables
Keir Fraser wrote:> > On 15 Feb 2006, at 18:49, Andrew Theurer wrote: > >> Below are per-function ratios of CPU time >> rev8830/rev8830-lock-reduction (derived from oprofile diffs) >> >> SDET: >> >> 9.84/1 restore_all_guest > > Kind of odd, since that function contains no locking. Perhaps VCPUs > are blocked so long that, by the time they get the CPU there is an > event pending and the hypercall gets preempted?I was a bit surprised at that, too.> > What do the perfctr numbers look like for #hypercalls and #exceptions? > Also worth adding one to __hypercall_create_continuation as that''ll > count #preemptions.I have not taken perfctr numbers yet, but I will today. Thanks, -Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel