Andrew Theurer
2006-Jun-05 21:45 UTC
[Xen-devel] Why is ''emulate'' as good as writable PT''s?
We have been doing some scalability work, and we noticed that forcing ''emulate'' in arch.x86/mm.c achieves the same performance on 1-way dom0. For example: xen-unstable, changeset 10200, i386 with PAE, 1-way benchmark xen0 xen0+emulate ------------- ---- ---- reaim_fserver 4421 4426 reaim_compute 2555 2531 SDET 4759 4810 The reaim benchmarks probably don''t have much fork(), where I''d expect writable page tables to help, but SDET has a ton of fork+exec. Could there be situations were we are inadvertently triggering a writable page table, where we should just be doing a update_va_mapping()? -Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Jun-05 22:17 UTC
RE: [Xen-devel] Why is ''emulate'' as good as writable PT''s?
> Could there be situations were we are inadvertently triggering a > writable page table, where we should just be doing aupdate_va_mapping()? Almost certainly. Singleton (or small batch) updates should not be using writeable pagetables, and should use update_va_mapping (or mmu_update if the VA isn''t known or may not be mapped). ~18 months ago Rolf wrote and checked in profile code to collect a histogram of the number of entries found to be modified when writeable pagetables are flushed. At the time there was a big spike at ''1'' which was fixed, but with all the various linux version upgrades it likely needs revisiting. The profile code also records the EIP that caused the writeable pagetables operation, so if you print out the value a few times you''ll quickly find the culprit. Thanks, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Theurer
2006-Jun-05 22:29 UTC
Re: [Xen-devel] Why is ''emulate'' as good as writable PT''s?
Ian Pratt wrote:>> Could there be situations were we are inadvertently triggering a >> writable page table, where we should just be doing a > update_va_mapping()? > > Almost certainly. Singleton (or small batch) updates should not be using > writeable pagetables, and should use update_va_mapping (or mmu_update if > the VA isn''t known or may not be mapped). > > ~18 months ago Rolf wrote and checked in profile code to collect a > histogram of the number of entries found to be modified when writeable > pagetables are flushed. > At the time there was a big spike at ''1'' which was fixed, but with all > the various linux version upgrades it likely needs revisiting. > > The profile code also records the EIP that caused the writeable > pagetables operation, so if you print out the value a few times you''ll > quickly find the culprit.Thanks! It looks like the histogram and EIP logs in ptwr_flush are still there, so we''ll run again with perfc=y and see if we can pinpoint the culprit. -Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Theurer
2006-Jun-06 20:28 UTC
Re: [Xen-devel] Why is ''emulate'' as good as writable PT''s?
Ian Pratt wrote:>> Could there be situations were we are inadvertently triggering a >> writable page table, where we should just be doing a >> > update_va_mapping()? > > Almost certainly. Singleton (or small batch) updates should not be using > writeable pagetables, and should use update_va_mapping (or mmu_update if > the VA isn''t known or may not be mapped). > > ~18 months ago Rolf wrote and checked in profile code to collect a > histogram of the number of entries found to be modified when writeable > pagetables are flushed. > At the time there was a big spike at ''1'' which was fixed, but with all > the various linux version upgrades it likely needs revisiting. > > The profile code also records the EIP that caused the writeable > pagetables operation, so if you print out the value a few times you''ll > quickly find the culprit. > > Thanks, > Ian >Yes, we definitely have a problem here. Tons of flushes with modified=1, and lots with <=10. The three benchmarks all seem to hit the same areas. Here is the output from running SDET, with snippets from System.map mixed in: Out of a total of 19601 writable PT updates: c01522b0 <=1 40 <=10 0 <=50 0 <=100 0 <=512 0 -------- c0151e90 T sys_mprotect c01524d3 t .text.lock.mprotect c014ed77 <=1 3418 <=10 4853 <=50 1674 <=100 70 <=512 0 -------- c014e84e T copy_page_range c014efc6 T free_pgtables c01522ab <=1 3728 <=10 0 <=50 0 <=100 0 <=512 0 -------- c0151e90 T sys_mprotect c01524d3 t .text.lock.mprotect c014b809 <=1 3752 <=10 1654 <=50 302 <=100 10 <=512 3 -------- c014b300 T unmap_vmas c014b9ba T zap_page_range c014b80b <=1 32 <=10 30 <=50 30 <=100 1 <=512 0 -------- c014b300 T unmap_vmas c014b9ba T zap_page_range -Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jun-06 21:14 UTC
Re: [Xen-devel] Why is ''emulate'' as good as writable PT''s?
On 6 Jun 2006, at 21:28, Andrew Theurer wrote:> Yes, we definitely have a problem here. Tons of flushes with > modified=1, and lots with <=10. The three benchmarks all seem to hit > the same areas. Here is the output from running SDET, with snippets > from System.map mixed in:Is this PAE? SMP guest? Do you know much about the SDET benchmark? For example, do you know how big the mprotect() calls it makes are likely to be? If vma''s are small and fairly sparse then the writable pagetable batching won''t be a win. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Theurer
2006-Jun-06 22:02 UTC
Re: [Xen-devel] Why is ''emulate'' as good as writable PT''s?
Keir Fraser wrote:> > On 6 Jun 2006, at 21:28, Andrew Theurer wrote: > >> Yes, we definitely have a problem here. Tons of flushes with >> modified=1, and lots with <=10. The three benchmarks all seem to hit >> the same areas. Here is the output from running SDET, with snippets >> from System.map mixed in: > > Is this PAE? SMP guest? > > Do you know much about the SDET benchmark? For example, do you know > how big the mprotect() calls it makes are likely to be? If vma''s are > small and fairly sparse then the writable pagetable batching won''t be > a win.1-way SMP kernel, PAE. not sure about the mprotect() calls. SDET basically calls a lot of utilities like ps, gcc, ispell, etc. Is it feasible to "xen-ify" unmap_vmas() and copy_page_range(), such that we use explicit hypercalls instead of faulting on the writes? -Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Theurer
2006-Jun-08 16:05 UTC
Re: [Xen-devel] Why is ''emulate'' as good as writable PT''s?
Keir Fraser wrote:> > On 6 Jun 2006, at 21:28, Andrew Theurer wrote: > >> Yes, we definitely have a problem here. Tons of flushes with >> modified=1, and lots with <=10. The three benchmarks all seem to hit >> the same areas. Here is the output from running SDET, with snippets >> from System.map mixed in: > > Is this PAE? SMP guest? > > Do you know much about the SDET benchmark? For example, do you know > how big the mprotect() calls it makes are likely to be? If vma''s are > small and fairly sparse then the writable pagetable batching won''t be > a win. > > -- Keir >I was wondering, perhaps we are not just triggering writable pagetables when we shouldn''t, but maybe we are flushing them back too early. I added some xen perf counters to get an idea of why we are flushing back wtpt''s (run on SDET again): modified: 0 <=10 <=20 <=30 <=40 <=50 1 writable pt updates T=1086 0 612 194 111 49 85 2 ptwr_flush: called from ptwr_emulated_update because wtpt exists T=0 3 ptwr_flush: called from ptwr_do_page_fault because wtpt is already used T=338 4 ptwr_flush: called from spurious_page_fault T=0 5 ptwr_flush: called from fixup_page_fault T=0 6 ptwr_flush: called from cleanup_wpt, do_mmuext_op (active) T=467 7 ptwr_flush: called from cleanup_wpt, do_mmuext_op (inactive) T=0 8 ptwr_flush: called from cleanup_wpt, update_va_mapping (active) T=280 9 ptwr_flush: called from cleanup_wpt, update_va_mapping (inactive) T=0 10 ptwr_flush: called from cleanup_wpt, do_mmu_update (active) T=1 11 ptwr_flush: called from cleanup_wpt, do_mmu_update (inactive) T=0 line 2: I don''t think we have a choice here, right? Not a big deal, as it''s not happening anyway. line 3: I think we can just goto emulate instead of flushing back the wtpt here, right? I''ve tried this, but no real difference in performance. Could we increase the number of wtpt''s we keep track of, so we don''t have to flush back or emulate? line 6: We seem to call cleanup_writable_pagetables unconditionally here, and if either of the active or inactive pages are used, they get flushed back. Do we always need to do this? line 8: Also call cleanup_writable_pagetables unconditionally here. Do the wtpt''s always need this to happen? Is is possible the update_va_mapping call is for an address space which does not affect the wtpt? line 10: Not seeing many flushes here, so I guess it''s not an issue. Sorry if these questions seem odd. There''s a good chance I am not "getting it" :) Thanks, -Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Jun-12 09:15 UTC
RE: [Xen-devel] Why is ''emulate'' as good as writable PT''s?
> I was wondering, perhaps we are not just triggering writablepagetables> when we shouldn''t, but maybe we are flushing them back too early. I > added some xen perf counters to get an idea of why we are flushingback> wtpt''s (run on SDET again):Are these numbers taken on a uniprocessor guest (or dom0?)> modified: 0 <=10 <=20 <=30 <=40 <=50 > 1 writable pt updates T=1086 0 612 194 111 49 85 > 2 ptwr_flush: called from ptwr_emulated_update because wtpt existsT=0> 3 ptwr_flush: called from ptwr_do_page_fault because wtpt is already > used T=338 > 4 ptwr_flush: called from spurious_page_fault T=0 > 5 ptwr_flush: called from fixup_page_fault T=0 > 6 ptwr_flush: called from cleanup_wpt, do_mmuext_op (active) T=467 > 7 ptwr_flush: called from cleanup_wpt, do_mmuext_op (inactive) T=0 > 8 ptwr_flush: called from cleanup_wpt, update_va_mapping (active)T=280> 9 ptwr_flush: called from cleanup_wpt, update_va_mapping (inactive)T=0> 10 ptwr_flush: called from cleanup_wpt, do_mmu_update (active) T=1 > 11 ptwr_flush: called from cleanup_wpt, do_mmu_update (inactive) T=0> line 3: I think we can just goto emulate instead of flushing back the > wtpt here, right? I''ve tried this, but no real difference in > performance. Could we increase the number of wtpt''s we keep track of, > so we don''t have to flush back or emulate?This will happen as part of a fork when we move on to the next page in the PT. It should be harmless unless we''re flopping back and forth.> line 6: We seem to call cleanup_writable_pagetables unconditionally > here, and if either of the active or inactive pages are used, they get > flushed back. Do we always need to do this?What''s the op? is it a TLB flush, invplg, or cr3 load?> line 8: Also call cleanup_writable_pagetables unconditionally here.Do> the wtpt''s always need this to happen? Is is possible the > update_va_mapping call is for an address space which does not affectthe> wtpt?It''s interesting to understand what the interaction is here. I''d like to know> line 10: Not seeing many flushes here, so I guess it''s not an issue. > > Sorry if these questions seem odd. There''s a good chance I am not > "getting it" :)This is useful work. It''s been on our todo list to re-profile this on newer kernels. Once upon a time we had it quite nicely tuned... Could you find out all the kernel EIPs that are triggering writeable pagetables with any frequency and list them for us. It might be good to turn everything into using update mmuop and then just turn on direct writes just for the fork case which is where we know need it. Thanks, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Theurer
2006-Jun-13 14:47 UTC
Re: [Xen-devel] Why is ''emulate'' as good as writable PT''s?
Ian Pratt wrote:>> I was wondering, perhaps we are not just triggering writable pagetables when we shouldn''t, but maybe we are flushing them back too early. I >> added some xen perf counters to get an idea of why we are flushing back >> wtpt''s (run on SDET again): > > Are these numbers taken on a uniprocessor guest (or dom0?)Yes.> >> modified: 0 <=10 <=20 <=30 <=40 <=50 >> 1 writable pt updates T=1086 0 612 194 111 49 85 >> 2 ptwr_fl: called from ptwr_emulated_update because wtpt exists T=0 >> 3 ptwr_fl: called from ptwr_do_page_fault because wtpt is used T=338 >> 4 ptwr_fl: called from spurious_page_fault T=0 >> 5 ptwr_fl: called from fixup_page_fault T=0 >> 6 ptwr_fl: called from cleanup_wpt, do_mmuext_op (active) T=467 >> 7 ptwr_fl: called from cleanup_wpt, do_mmuext_op (inactive) T=0 >> 8 ptwr_fl: called from cleanup_wpt, update_va_mapping (active) T=280 >> 9 ptwr_fl: called from cleanup_wpt, update_va_mapping (inactive) T=0 >> 10 ptwr_flush: called from cleanup_wpt, do_mmu_update (active) T=1 >> 11 ptwr_flush: called from cleanup_wpt, do_mmu_update (inactive) T=0 >> line 3: I think we can just goto emulate instead of flushing back the >> wtpt here, right? I''ve tried this, but no real difference in >> performance. Could we increase the number of wtpt''s we keep track of, >> so we don''t have to flush back or emulate? > > This will happen as part of a fork when we move on to the next page in > the PT. It should be harmless unless we''re flopping back and forth.OK> >> line 6: We seem to call cleanup_writable_pagetables unconditionally >> here, and if either of the active or inactive pages are used, they get >> flushed back. Do we always need to do this? > > What''s the op? is it a TLB flush, invplg, or cr3 load?I don''t know, but I''ll find out.> >> line 8: Also call cleanup_writable_pagetables unconditionally here. >> Do the wtpt''s always need this to happen? Is is possible the >> update_va_mapping call is for an address space which does not affect >> the wtpt? > > It''s interesting to understand what the interaction is here. I''d like to > know > >> line 10: Not seeing many flushes here, so I guess it''s not an issue. >> >> Sorry if these questions seem odd. There''s a good chance I am not >> "getting it" :) > > This is useful work. It''s been on our todo list to re-profile this on > newer kernels. Once upon a time we had it quite nicely tuned... > > Could you find out all the kernel EIPs that are triggering writeable > pagetables with any frequency and list them for us. It might be good to > turn everything into using update mmuop and then just turn on direct > writes just for the fork case which is where we know need it.I think I can do that. I''ll just use something similar to the EIP logging Xen has for finding out what triggered the wtpt flushes. Thanks, Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel