Bruce Rogers
2008-Jan-05 04:49 UTC
[Xen-devel] [PATCH] Fix performance problems with mprotect()
While working on a database scaling problem using a SAP database test suite we discovered that these enterprise level large shared memory databases are very heavy users of mprotect(), to the extent that the performance overhead in current Xenolinux impacts scaling beyond a few cpus quite badly. A single cpu run under Xen has a nominal impact, but scaling out to 8 cpus results in a performance of less than 10% of native throughput. Ouch! The problem was isolated to the relatively high overhead of how mprotect() updates ptes, and the lack of efficient batching opportunities as presently implemented. By adding a hypercall which batches pte updates such that hardware flags updates (accessed or dirty bits) are not lost, and redoing the performance critical portion of mprotect() to take advantage of that, performance has improved to where it is essentially equivalent to native performance with this SAP database test suite. I haven''t tested other database implementations, but I would imagine they would have similar results. This is obviously quite an improvement and should allow Xen to be deployed for more enterprise level workloads. There are actually 2 new hypercalls included in this patch - one that batches flags-only changes to pte''s within a single page table, and another which allows the full pte to be changed - both "atomically" as described above. I imagine there are other places in Linux that could benefit from these hypercalls, but these patches focus on the mprotect performance problem only. The hypervisor patch was mostly written and tested with the 3.0.4 era hypervisor that ships with SLES 10, and then ported and tested on current xen-unstable. The Linux patch was written and tested on 2.6.16 and made apply to the 2.6.18 tree without further testing. Feedback and comments are appreciated. Signed-off-by: Bruce Rogers <brogers@novell.com> - Bruce Rogers _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2008-Jan-05 13:35 UTC
Re: [Xen-devel] [PATCH] Fix performance problems with mprotect()
On Fri, Jan 04, 2008 at 09:49:41PM -0700, Bruce Rogers wrote:> While working on a database scaling problemThe changelog entry for this patch should have a suitable entry for http://wiki.xensource.com/xenwiki/APIChangelog, as suggested by Ian Jackson. (Regarding the changes, is there a sane way to detect whether the new API is present or not? IE a sensible unique errno return or something else.) regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bruce Rogers
2008-Jan-05 15:05 UTC
Re: [Xen-devel] [PATCH] Fix performance problems with mprotect()
I believe -ENOSYS, in this case at least, correctly identifies when the new code is not there. I didn''t try to bump any revision, or add an explicit feature flag somewhere. I would have preferred to do a one time detection by identity remapping some page and see if it succeeds and then just test a global flag to determine whether to use the new method or not, but it appeared to me at least to not be the Linux way (I am still somewhat new to Linux) and was unsure how issues such as migrating the domain to hypervisors which might be missing this feature is handled, so I went with this approach of always being able to fall back to the previously existing method. Ugly, I agree, but it works. I''ll look into the APIChangelog entry after more feedback rolls in. - Bruce>>> John Levon <levon@movementarian.org> 01/05/08 6:35 AM >>>On Fri, Jan 04, 2008 at 09:49:41PM -0700, Bruce Rogers wrote:> While working on a database scaling problemThe changelog entry for this patch should have a suitable entry for http://wiki.xensource.com/xenwiki/APIChangelog, as suggested by Ian Jackson. (Regarding the changes, is there a sane way to detect whether the new API is present or not? IE a sensible unique errno return or something else.) regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Jan-06 23:22 UTC
Re: [Xen-devel] [PATCH] Fix performance problems with mprotect()
Thanks, I''ll need a little while to look into this patch. It''s fairly obviously not a candidate for 3.2.0 at this point anyway (merge window is long gone by). -- Keir On 5/1/08 04:49, "Bruce Rogers" <brogers@novell.com> wrote:> While working on a database scaling problem using a SAP database test suite we > discovered that these enterprise level large shared memory databases are very > heavy users of mprotect(), to the extent that the performance overhead in > current Xenolinux impacts scaling beyond a few cpus quite badly. A single > cpu run under Xen has a nominal impact, but scaling out to 8 cpus results in a > performance of less than 10% of native throughput. Ouch! > > The problem was isolated to the relatively high overhead of how mprotect() > updates ptes, and the lack of efficient batching opportunities as presently > implemented. By adding a hypercall which batches pte updates such that > hardware flags updates (accessed or dirty bits) are not lost, and redoing the > performance critical portion of mprotect() to take advantage of that, > performance has improved to where it is essentially equivalent to native > performance with this SAP database test suite. I haven''t tested other > database implementations, but I would imagine they would have similar results. > > This is obviously quite an improvement and should allow Xen to be deployed for > more enterprise level workloads. > > There are actually 2 new hypercalls included in this patch - one that batches > flags-only changes to pte''s within a single page table, and another which > allows the full pte to be changed - both "atomically" as described above. > > I imagine there are other places in Linux that could benefit from these > hypercalls, but these patches focus on the mprotect performance problem only. > > The hypervisor patch was mostly written and tested with the 3.0.4 era > hypervisor that ships with SLES 10, and then ported and tested on current > xen-unstable. > The Linux patch was written and tested on 2.6.16 and made apply to the 2.6.18 > tree without further testing. > > Feedback and comments are appreciated. > > Signed-off-by: Bruce Rogers <brogers@novell.com> > > - Bruce Rogers > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2008-Jan-07 15:03 UTC
Re: [Xen-devel] [PATCH] Fix performance problems with mprotect()
On the Xen patch: Both added case blocks in do_mmu_update() have a nested switch statement that seems redundant (i.e. the outer switch already only handle PGT_l1_page_table pages, but the inner switch check this same value again. The same two case block access addresses obtained from map_domain_page_with_cache() through __copy_from_user(). Is there any particular reason to not use direct accesses here? (Note that mod_l1_entry() has to use __copy_from_user() as it may be called from do_update_va_mapping()). Likewise I would think that there''s no strict need for update_intpte_sync() to use paging_cmpxchg_guest_entry(), but here I would agree that it''s easier to re-use the existing function than to create and use a new one. An additional piece of concern regarding the bit assignments of MMU_FLAG_RANGE_UPDATE''s val parameter (Keir, maybe you need to comment on this one): The whole mmu_update interface, being defined in public/xen.h, is supposed to be sufficiently architecture neutral, which it won''t be anymore in the way it currently is being modified. But maybe I''m mistaken and the interface''s declaration is just badly placed (would apply to the mmuext interface then, too)? In the Linux patch, I''d just like to see the abstraction to be less Xen- specific, i.e. something like (perhaps in include/asm-generic/pgtable.h) #ifndef arch_change_pte_range # define arch_change_pte_range(mm, pmd, addr, end, newprot) 0 #endif and then in change_pmd_range(): ... if (pmd_none_or_clear_bad(pmd)) continue; if (arch_change_pte_range(mm, pmd, addr, next, newprot)) continue; change_pte_range(mm, pmd, addr, next, newprot); } while (pmd++, addr = next, addr != end); ... The BUG() (which really can be BUG_ON() here) would go into the actual function then. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andi Kleen
2008-Jan-07 15:18 UTC
[Xen-devel] Re: [PATCH] Fix performance problems with mprotect()
"Bruce Rogers" <brogers@novell.com> writes:> + > + /* Allowed to change in Accessed/Dirty flags only. */ > + BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));Are you sure that BUG_ON can''t be triggered from the hypercall? It should error out in this case I think. -Andi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bruce Rogers
2008-Jan-07 16:16 UTC
[Xen-devel] Re: [PATCH] Fix performance problems with mprotect()
I''m not sure how this BUG_ON would get triggered with current x86 paging semantics, and probably isn''t needed I think. Am I missing something? The code is almost exactly the same as the existing update_intpte, which has the same BUG_ON, and so I just left it in. (BTW:I considered merging the new functions into the non "sync" versions but opted not to.) - Bruce>>> On 1/7/2008 at 8:18 AM, in message <p73d4sdsly4.fsf@bingen.suse.de>, Andi Kleen<andi@firstfloor.org> wrote:> "Bruce Rogers" <brogers@novell.com> writes: >> + >> + /* Allowed to change in Accessed/Dirty flags only. */ >> + BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY)); > > > Are you sure that BUG_ON can''t be triggered from the hypercall? > It should error out in this case I think. > > -Andi_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bruce Rogers
2008-Jan-07 17:05 UTC
Re: [Xen-devel] [PATCH] Fix performance problems with mprotect()
>>> On 1/7/2008 at 8:03 AM, in message <47824D56.76E4.0078.0@novell.com>, "JanBeulich" <jbeulich@novell.com> wrote:> On the Xen patch: > > Both added case blocks in do_mmu_update() have a nested switch > statement that seems redundant (i.e. the outer switch already only > handle PGT_l1_page_table pages, but the inner switch check this same > value again.Right - a silly oversight - will fix.> > The same two case block access addresses obtained from > map_domain_page_with_cache() through __copy_from_user(). Is there > any particular reason to not use direct accesses here? (Note that > mod_l1_entry() has to use __copy_from_user() as it may be called from > do_update_va_mapping()). Likewise I would think that there''s no strict > need for update_intpte_sync() to use paging_cmpxchg_guest_entry(), > but here I would agree that it''s easier to re-use the existing function > than to create and use a new one.I''ll change to use direct access.> > An additional piece of concern regarding the bit assignments of > MMU_FLAG_RANGE_UPDATE''s val parameter (Keir, maybe you need to > comment on this one): The whole mmu_update interface, being > defined in public/xen.h, is supposed to be sufficiently architecture > neutral, which it won''t be anymore in the way it currently is being > modified. But maybe I''m mistaken and the interface''s declaration is > just badly placed (would apply to the mmuext interface then, too)?I should point out that the MMU_FLAG_RANGE_UPDATE hypercall didn''t end up being needed for improving the SAP test, but does get invoked by other uses of mprotect() and feel that it is useful. As it stands I''ve only implemented x86 version of this, other architecture owners would need to do the same. Perhaps as far as what is presented in public/xen.h, we should just have a more generic description (not pointing out the format) and leave it up to each architecture as to what specific format for the val member makes sense.> > In the Linux patch, I''d just like to see the abstraction to be less Xen- > specific, i.e. something like > > (perhaps in include/asm-generic/pgtable.h) > > #ifndef arch_change_pte_range > # define arch_change_pte_range(mm, pmd, addr, end, newprot) 0 > #endif > > and then in change_pmd_range(): > > ... > if (pmd_none_or_clear_bad(pmd)) > continue; > if (arch_change_pte_range(mm, pmd, addr, next, newprot)) > continue; > change_pte_range(mm, pmd, addr, next, newprot); > } while (pmd++, addr = next, addr != end); > ... > > The BUG() (which really can be BUG_ON() here) would go into the actual > function then.Yes, that approach does look better. I''ll respin the patch.> > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Jan-07 17:35 UTC
Re: [Xen-devel] Re: [PATCH] Fix performance problems with mprotect()
The BUG_ON() is valid until the per-domain biglock is removed. -- Keir On 7/1/08 16:16, "Bruce Rogers" <BROGERS@novell.com> wrote:> I''m not sure how this BUG_ON would get triggered with current x86 paging > semantics, and probably isn''t needed I think. > Am I missing something? > The code is almost exactly the same as the existing update_intpte, which has > the same BUG_ON, and so I just left it in. > (BTW:I considered merging the new functions into the non "sync" versions but > opted not to.) > > - Bruce > >>>> On 1/7/2008 at 8:18 AM, in message <p73d4sdsly4.fsf@bingen.suse.de>, Andi >>>> Kleen > <andi@firstfloor.org> wrote: >> "Bruce Rogers" <brogers@novell.com> writes: >>> + >>> + /* Allowed to change in Accessed/Dirty flags only. */ >>> + BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY)); >> >> >> Are you sure that BUG_ON can''t be triggered from the hypercall? >> It should error out in this case I think. >> >> -Andi > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Jan-07 17:45 UTC
Re: [Xen-devel] [PATCH] Fix performance problems with mprotect()
On 7/1/08 17:05, "Bruce Rogers" <BROGERS@novell.com> wrote:>> An additional piece of concern regarding the bit assignments of >> MMU_FLAG_RANGE_UPDATE''s val parameter (Keir, maybe you need to >> comment on this one): The whole mmu_update interface, being >> defined in public/xen.h, is supposed to be sufficiently architecture >> neutral, which it won''t be anymore in the way it currently is being >> modified. But maybe I''m mistaken and the interface''s declaration is >> just badly placed (would apply to the mmuext interface then, too)? > I should point out that the MMU_FLAG_RANGE_UPDATE hypercall didn''t end up > being needed for improving the SAP test, but does get invoked by other uses of > mprotect() and feel that it is useful. As it stands I''ve only implemented x86 > version of this, other architecture owners would need to do the same. > Perhaps as far as what is presented in public/xen.h, we should just have a > more generic description (not pointing out the format) and leave it up to each > architecture as to what specific format for the val member makes sense.No other architecture uses do_mmu_update(). The public definitions are simply in the wrong header file. Is MMU_FLAG_RANGE_UPDATE really useful if you have MMU_ATOMIC_PT_UPDATE? If it is it''s presumably a performance issue, and the question should be: why is a sequence MMU_ATOMIC_PT_UPDATE slower than MMU_FLAG_RANGE_UPDATE, and can we make it fast enough that MMU_FLAG_RANGE_UPDATE is redundant? Also, the documentation for MMU_ATOMIC_PT_UDATE in xen.h is possibly incorrect. It doesn''t make sense to me -- are your meanings for ptr[2:] and val actually correct? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bruce Rogers
2008-Jan-07 18:03 UTC
Re: [Xen-devel] [PATCH] Fix performance problems with mprotect()
>>> On 1/7/2008 at 10:45 AM, in message <C3A815C0.11D69%Keir.Fraser@cl.cam.ac.uk>,Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:> On 7/1/08 17:05, "Bruce Rogers" <BROGERS@novell.com> wrote: > >>> An additional piece of concern regarding the bit assignments of >>> MMU_FLAG_RANGE_UPDATE''s val parameter (Keir, maybe you need to >>> comment on this one): The whole mmu_update interface, being >>> defined in public/xen.h, is supposed to be sufficiently architecture >>> neutral, which it won''t be anymore in the way it currently is being >>> modified. But maybe I''m mistaken and the interface''s declaration is >>> just badly placed (would apply to the mmuext interface then, too)? >> I should point out that the MMU_FLAG_RANGE_UPDATE hypercall didn''t end up >> being needed for improving the SAP test, but does get invoked by other uses > of >> mprotect() and feel that it is useful. As it stands I''ve only implemented > x86 >> version of this, other architecture owners would need to do the same. >> Perhaps as far as what is presented in public/xen.h, we should just have a >> more generic description (not pointing out the format) and leave it up to > each >> architecture as to what specific format for the val member makes sense. > > No other architecture uses do_mmu_update(). The public definitions are > simply in the wrong header file. > > Is MMU_FLAG_RANGE_UPDATE really useful if you have MMU_ATOMIC_PT_UPDATE? If > it is it''s presumably a performance issue, and the question should be: why > is a sequence MMU_ATOMIC_PT_UPDATE slower than MMU_FLAG_RANGE_UPDATE, and > can we make it fast enough that MMU_FLAG_RANGE_UPDATE is redundant?I did find a quite measurable performance difference between using the two methods, but I consider it useful mainly because of its compactness in terms of being able to update all ptes in a page table without needing a large array to initialize from. That said, I am not beholden to this flag range hypercall, since the other one is more generic and we should be able to make it fast enough. If you prefer, we can just go with the MMU_ATOMIC_PT_UPDATE hypercall only.> > Also, the documentation for MMU_ATOMIC_PT_UDATE in xen.h is possibly > incorrect. It doesn''t make sense to me -- are your meanings for ptr[2:] and > val actually correct?Oh,you are correct. I had originally coded MMU_ATOMIC_PT_UDATE as it is described, since doing it that way was giving me about 14% better performance than doing the batching one mmu_update_t structure at a time, but I again opted for backing off to the more general case and forgot to update that description for that hypercall. I''ll get that fixed.> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bruce Rogers
2008-Jan-19 00:03 UTC
[Xen-devel] Re: [PATCH] Fix performance problems with mprotect()
I don''t see how. If you are seeing a problem, please elaborate. - Bruce>>> On 1/7/2008 at 8:18 AM, in message <p73d4sdsly4.fsf@bingen.suse.de>, Andi Kleen<andi@firstfloor.org> wrote:> "Bruce Rogers" <brogers@novell.com> writes: >> + >> + /* Allowed to change in Accessed/Dirty flags only. */ >> + BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY)); > > > Are you sure that BUG_ON can''t be triggered from the hypercall? > It should error out in this case I think. > > -Andi > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Apparently Analagous Threads
- [PATCH] virtio_net: Add schedule check to napi_enable call
- [PATCH] virtio_net: Add schedule check to napi_enable call
- Xen pv_ops domU :: BUG() in remove_from_page_cache()
- [PATCH 0/7] (Re-)introducing pvops for x86_64 - Real pvops work part
- [PATCH 0/7] (Re-)introducing pvops for x86_64 - Real pvops work part