Andres Lagar-Cavilla
2011-Nov-08 03:28 UTC
[Xen-devel] [PATCH 0 of 5] p2m synchronization groundwork
This patch series lays the groundwork for improving the synchronization of primitives accessing the p2m. This is a partial repost of the patches emailed previously as a RFC. These patches are now intended for committing to the tree. We change the API for accessing the p2m to a family of functions get_gfn/put_gfn. The name intends to reflect the fact that even lookups are meant to obtain exclusive access to a p2m entry, and that said access should be relinquished (put_gfn) when done. The patches, however, alter little functionality. The API name change does not involve yet additional locking or ref-counting. They will, however, throw a "barrier" that will force any new commits to conform to the new API. Patches are based off 24066:54a5e994a241. Should the new XENMEM calls be accepted before this, the series needs to be updated to also change the API there. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> xen/arch/x86/mm/mm-locks.h | 27 ++++- xen/arch/x86/mm/mm-locks.h | 46 +++++++++++ xen/arch/x86/mm/mm-locks.h | 12 ++ xen/arch/x86/mm/p2m-pod.c | 40 ++++++--- xen/include/asm-x86/domain.h | 3 + xen/include/asm-x86/p2m.h | 5 + xen/arch/x86/mm/p2m.c | 38 +++++++++ xen/include/asm-x86/p2m.h | 40 +--------- xen/arch/x86/cpu/mcheck/vmce.c | 9 +- xen/arch/x86/debug.c | 17 ++- xen/arch/x86/domain.c | 27 +++++- xen/arch/x86/domctl.c | 15 ++- xen/arch/x86/hvm/emulate.c | 29 ++++++- xen/arch/x86/hvm/hvm.c | 133 +++++++++++++++++++++++++------ xen/arch/x86/hvm/mtrr.c | 2 +- xen/arch/x86/hvm/nestedhvm.c | 2 +- xen/arch/x86/hvm/stdvga.c | 4 +- xen/arch/x86/hvm/svm/nestedsvm.c | 12 +- xen/arch/x86/hvm/svm/svm.c | 11 +- xen/arch/x86/hvm/viridian.c | 8 +- xen/arch/x86/hvm/vmx/vmx.c | 15 ++- xen/arch/x86/hvm/vmx/vvmx.c | 13 ++- xen/arch/x86/mm.c | 153 +++++++++++++++++++++++++++++------- xen/arch/x86/mm/guest_walk.c | 30 +++++- xen/arch/x86/mm/hap/guest_walk.c | 16 ++- xen/arch/x86/mm/hap/nested_hap.c | 15 ++- xen/arch/x86/mm/mem_event.c | 23 ++++- xen/arch/x86/mm/mem_sharing.c | 27 +++++- xen/arch/x86/mm/p2m-pod.c | 19 ++- xen/arch/x86/mm/p2m-pt.c | 6 +- xen/arch/x86/mm/p2m.c | 34 ++++--- xen/arch/x86/mm/shadow/common.c | 6 +- xen/arch/x86/mm/shadow/multi.c | 85 ++++++++++++++----- xen/arch/x86/mm/shadow/types.h | 10 +- xen/arch/x86/physdev.c | 8 +- xen/arch/x86/traps.c | 19 +++- xen/common/grant_table.c | 30 +++++- xen/common/memory.c | 13 ++- xen/common/tmem_xen.c | 21 +++- xen/include/asm-ia64/mm.h | 2 + xen/include/asm-x86/guest_pt.h | 6 +- xen/include/asm-x86/hvm/hvm.h | 5 +- xen/include/asm-x86/hvm/vmx/vvmx.h | 1 + xen/include/asm-x86/p2m.h | 38 ++++++-- 44 files changed, 799 insertions(+), 276 deletions(-) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-08 03:28 UTC
[Xen-devel] [PATCH 1 of 5] Refactor mm-lock ordering constructs
xen/arch/x86/mm/mm-locks.h | 27 +++++++++++++++++++-------- 1 files changed, 19 insertions(+), 8 deletions(-) The mm layer has a construct to enforce locks are taken in a pre- defined order, and thus avert deadlock. Refactor pieces of this code for later use, no functional changes. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 54a5e994a241 -r 75f1e156386d xen/arch/x86/mm/mm-locks.h --- a/xen/arch/x86/mm/mm-locks.h +++ b/xen/arch/x86/mm/mm-locks.h @@ -28,6 +28,7 @@ /* Per-CPU variable for enforcing the lock ordering */ DECLARE_PER_CPU(int, mm_lock_level); +#define __get_lock_level() (this_cpu(mm_lock_level)) static inline void mm_lock_init(mm_lock_t *l) { @@ -42,22 +43,32 @@ static inline int mm_locked_by_me(mm_loc return (l->lock.recurse_cpu == current->processor); } +/* If you see this crash, the numbers printed are lines in this file + * where the offending locks are declared. */ +#define __check_lock_level(l) \ +do { \ + if ( unlikely(__get_lock_level()) > (l) ) \ + panic("mm locking order violation: %i > %i\n", \ + __get_lock_level(), (l)); \ +} while(0) + +#define __set_lock_level(l) \ +do { \ + __get_lock_level() = (l); \ +} while(0) + static inline void _mm_lock(mm_lock_t *l, const char *func, int level, int rec) { - /* If you see this crash, the numbers printed are lines in this file - * where the offending locks are declared. */ - if ( unlikely(this_cpu(mm_lock_level) > level) ) - panic("mm locking order violation: %i > %i\n", - this_cpu(mm_lock_level), level); + __check_lock_level(level); spin_lock_recursive(&l->lock); if ( l->lock.recurse_cnt == 1 ) { l->locker_function = func; - l->unlock_level = this_cpu(mm_lock_level); + l->unlock_level = __get_lock_level(); } else if ( (unlikely(!rec)) ) panic("mm lock already held by %s\n", l->locker_function); - this_cpu(mm_lock_level) = level; + __set_lock_level(level); } /* This wrapper uses the line number to express the locking order below */ #define declare_mm_lock(name) \ @@ -72,7 +83,7 @@ static inline void mm_unlock(mm_lock_t * if ( l->lock.recurse_cnt == 1 ) { l->locker_function = "nobody"; - this_cpu(mm_lock_level) = l->unlock_level; + __set_lock_level(l->unlock_level); } spin_unlock_recursive(&l->lock); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-08 03:28 UTC
[Xen-devel] [PATCH 2 of 5] Declare an order-enforcing construct for external locks used in the mm layer
xen/arch/x86/mm/mm-locks.h | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 46 insertions(+), 0 deletions(-) Declare an order-enforcing construct for a lock used in the mm layer that is not of type mm_lock_t. This is useful whenever the mm layer takes locks from other subsystems, or locks not implemented as mm_lock_t. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 75f1e156386d -r 81eedccb3a85 xen/arch/x86/mm/mm-locks.h --- a/xen/arch/x86/mm/mm-locks.h +++ b/xen/arch/x86/mm/mm-locks.h @@ -70,6 +70,27 @@ static inline void _mm_lock(mm_lock_t *l panic("mm lock already held by %s\n", l->locker_function); __set_lock_level(level); } + +static inline void _mm_enforce_order_lock_pre(int level) +{ + __check_lock_level(level); +} + +static inline void _mm_enforce_order_lock_post(int level, int *unlock_level, + unsigned short *recurse_count) +{ + if ( recurse_count ) + { + if ( *recurse_count++ == 0 ) + { + *unlock_level = __get_lock_level(); + } + } else { + *unlock_level = __get_lock_level(); + } + __set_lock_level(level); +} + /* This wrapper uses the line number to express the locking order below */ #define declare_mm_lock(name) \ static inline void mm_lock_##name(mm_lock_t *l, const char *func, int rec)\ @@ -78,6 +99,16 @@ static inline void _mm_lock(mm_lock_t *l #define mm_lock(name, l) mm_lock_##name(l, __func__, 0) #define mm_lock_recursive(name, l) mm_lock_##name(l, __func__, 1) +/* This wrapper is intended for "external" locks which do not use + * the mm_lock_t types. Such locks inside the mm code are also subject + * to ordering constraints. */ +#define declare_mm_order_constraint(name) \ + static inline void mm_enforce_order_lock_pre_##name(void) \ + { _mm_enforce_order_lock_pre(__LINE__); } \ + static inline void mm_enforce_order_lock_post_##name( \ + int *unlock_level, unsigned short *recurse_count) \ + { _mm_enforce_order_lock_post(__LINE__, unlock_level, recurse_count); } \ + static inline void mm_unlock(mm_lock_t *l) { if ( l->lock.recurse_cnt == 1 ) @@ -88,6 +119,21 @@ static inline void mm_unlock(mm_lock_t * spin_unlock_recursive(&l->lock); } +static inline void mm_enforce_order_unlock(int unlock_level, + unsigned short *recurse_count) +{ + if ( recurse_count ) + { + BUG_ON(*recurse_count == 0); + if ( *recurse_count-- == 1 ) + { + __set_lock_level(unlock_level); + } + } else { + __set_lock_level(unlock_level); + } +} + /************************************************************************ * * * To avoid deadlocks, these locks _MUST_ be taken in the order they''re * _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-08 03:28 UTC
[Xen-devel] [PATCH 3 of 5] Enforce ordering constraints for the page alloc lock in the PoD code
xen/arch/x86/mm/mm-locks.h | 12 ++++++++++++ xen/arch/x86/mm/p2m-pod.c | 40 +++++++++++++++++++++++++++------------- xen/include/asm-x86/domain.h | 3 +++ xen/include/asm-x86/p2m.h | 5 +++++ 4 files changed, 47 insertions(+), 13 deletions(-) The page alloc lock is sometimes used in the PoD code, with an explicit expectation of ordering. Use our ordering constructs in the mm layer to enforce this. The additional book-keeping variables are kept in the arch_domain sub-struct, as they are x86-specific to the whole domain. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 81eedccb3a85 -r 0e8fb21ab4ff xen/arch/x86/mm/mm-locks.h --- a/xen/arch/x86/mm/mm-locks.h +++ b/xen/arch/x86/mm/mm-locks.h @@ -174,6 +174,18 @@ declare_mm_lock(p2m) #define p2m_unlock(p) mm_unlock(&(p)->lock) #define p2m_locked_by_me(p) mm_locked_by_me(&(p)->lock) +/* Page alloc lock (per-domain) + * + * This is an external lock, not represented by an mm_lock_t. However, + * pod code uses it in conjunction with the p2m lock, and expecting + * the ordering which we enforce here. + * The lock is not recursive. */ + +declare_mm_order_constraint(page_alloc) +#define page_alloc_mm_pre_lock() mm_enforce_order_lock_pre_page_alloc() +#define page_alloc_mm_post_lock(l) mm_enforce_order_lock_post_page_alloc(&(l), NULL) +#define page_alloc_mm_unlock(l) mm_enforce_order_unlock((l), NULL) + /* Paging lock (per-domain) * * For shadow pagetables, this lock protects diff -r 81eedccb3a85 -r 0e8fb21ab4ff xen/arch/x86/mm/p2m-pod.c --- a/xen/arch/x86/mm/p2m-pod.c +++ b/xen/arch/x86/mm/p2m-pod.c @@ -45,6 +45,20 @@ #define superpage_aligned(_x) (((_x)&(SUPERPAGE_PAGES-1))==0) +/* Enforce lock ordering when grabbing the "external" page_alloc lock */ +static inline void lock_page_alloc(struct p2m_domain *p2m) +{ + page_alloc_mm_pre_lock(); + spin_lock(&(p2m->domain->page_alloc_lock)); + page_alloc_mm_post_lock(p2m->domain->arch.page_alloc_unlock_level); +} + +static inline void unlock_page_alloc(struct p2m_domain *p2m) +{ + page_alloc_mm_unlock(p2m->domain->arch.page_alloc_unlock_level); + spin_unlock(&(p2m->domain->page_alloc_lock)); +} + /* * Populate-on-demand functionality */ @@ -100,7 +114,7 @@ p2m_pod_cache_add(struct p2m_domain *p2m unmap_domain_page(b); } - spin_lock(&d->page_alloc_lock); + lock_page_alloc(p2m); /* First, take all pages off the domain list */ for(i=0; i < 1 << order ; i++) @@ -128,7 +142,7 @@ p2m_pod_cache_add(struct p2m_domain *p2m * This may cause "zombie domains" since the page will never be freed. */ BUG_ON( d->arch.relmem != RELMEM_not_started ); - spin_unlock(&d->page_alloc_lock); + unlock_page_alloc(p2m); return 0; } @@ -245,7 +259,7 @@ p2m_pod_set_cache_target(struct p2m_doma /* Grab the lock before checking that pod.super is empty, or the last * entries may disappear before we grab the lock. */ - spin_lock(&d->page_alloc_lock); + lock_page_alloc(p2m); if ( (p2m->pod.count - pod_target) > SUPERPAGE_PAGES && !page_list_empty(&p2m->pod.super) ) @@ -257,7 +271,7 @@ p2m_pod_set_cache_target(struct p2m_doma ASSERT(page != NULL); - spin_unlock(&d->page_alloc_lock); + unlock_page_alloc(p2m); /* Then free them */ for ( i = 0 ; i < (1 << order) ; i++ ) @@ -378,7 +392,7 @@ p2m_pod_empty_cache(struct domain *d) BUG_ON(!d->is_dying); spin_barrier(&p2m->lock.lock); - spin_lock(&d->page_alloc_lock); + lock_page_alloc(p2m); while ( (page = page_list_remove_head(&p2m->pod.super)) ) { @@ -403,7 +417,7 @@ p2m_pod_empty_cache(struct domain *d) BUG_ON(p2m->pod.count != 0); - spin_unlock(&d->page_alloc_lock); + unlock_page_alloc(p2m); } int @@ -417,7 +431,7 @@ p2m_pod_offline_or_broken_hit(struct pag if ( !(d = page_get_owner(p)) || !(p2m = p2m_get_hostp2m(d)) ) return 0; - spin_lock(&d->page_alloc_lock); + lock_page_alloc(p2m); bmfn = mfn_x(page_to_mfn(p)); page_list_for_each_safe(q, tmp, &p2m->pod.super) { @@ -448,12 +462,12 @@ p2m_pod_offline_or_broken_hit(struct pag } } - spin_unlock(&d->page_alloc_lock); + unlock_page_alloc(p2m); return 0; pod_hit: page_list_add_tail(p, &d->arch.relmem_list); - spin_unlock(&d->page_alloc_lock); + unlock_page_alloc(p2m); return 1; } @@ -994,7 +1008,7 @@ p2m_pod_demand_populate(struct p2m_domai if ( q == p2m_guest && gfn > p2m->pod.max_guest ) p2m->pod.max_guest = gfn; - spin_lock(&d->page_alloc_lock); + lock_page_alloc(p2m); if ( p2m->pod.count == 0 ) goto out_of_memory; @@ -1008,7 +1022,7 @@ p2m_pod_demand_populate(struct p2m_domai BUG_ON((mfn_x(mfn) & ((1 << order)-1)) != 0); - spin_unlock(&d->page_alloc_lock); + unlock_page_alloc(p2m); gfn_aligned = (gfn >> order) << order; @@ -1040,7 +1054,7 @@ p2m_pod_demand_populate(struct p2m_domai return 0; out_of_memory: - spin_unlock(&d->page_alloc_lock); + unlock_page_alloc(p2m); printk("%s: Out of populate-on-demand memory! tot_pages %" PRIu32 " pod_entries %" PRIi32 "\n", __func__, d->tot_pages, p2m->pod.entry_count); @@ -1049,7 +1063,7 @@ out_fail: return -1; remap_and_retry: BUG_ON(order != PAGE_ORDER_2M); - spin_unlock(&d->page_alloc_lock); + unlock_page_alloc(p2m); /* Remap this 2-meg region in singleton chunks */ gfn_aligned = (gfn>>order)<<order; diff -r 81eedccb3a85 -r 0e8fb21ab4ff xen/include/asm-x86/domain.h --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -269,6 +269,9 @@ struct arch_domain struct paging_domain paging; struct p2m_domain *p2m; + /* To enforce lock ordering in the pod code wrt the + * page_alloc lock */ + int page_alloc_unlock_level; /* nestedhvm: translate l2 guest physical to host physical */ struct p2m_domain *nested_p2m[MAX_NESTEDP2M]; diff -r 81eedccb3a85 -r 0e8fb21ab4ff xen/include/asm-x86/p2m.h --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -270,6 +270,11 @@ struct p2m_domain { * + p2m_pod_demand_populate() grabs both; the p2m lock to avoid * double-demand-populating of pages, the page_alloc lock to * protect moving stuff from the PoD cache to the domain page list. + * + * We enforce this lock ordering through a construct in mm-locks.h. + * This demands, however, that we store the previous lock-ordering + * level in effect before grabbing the page_alloc lock. The unlock + * level is stored in the arch section of the domain struct. */ struct { struct page_list_head super, /* List of superpages */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-08 03:28 UTC
[Xen-devel] [PATCH 4 of 5] Refactor p2m get_entry accessor
xen/arch/x86/mm/p2m.c | 38 ++++++++++++++++++++++++++++++++++++++ xen/include/asm-x86/p2m.h | 40 ++-------------------------------------- 2 files changed, 40 insertions(+), 38 deletions(-) Move the main query accessor to the p2m outside of an inline and into the p2m code itself. This will allow for p2m internal locking to be added to the accessor later. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 0e8fb21ab4ff -r f07d4ebe5248 xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -144,6 +144,44 @@ void p2m_change_entry_type_global(struct p2m_unlock(p2m); } +mfn_t gfn_to_mfn_type_p2m(struct p2m_domain *p2m, unsigned long gfn, + p2m_type_t *t, p2m_access_t *a, p2m_query_t q, + unsigned int *page_order) +{ + mfn_t mfn; + + if ( !p2m || !paging_mode_translate(p2m->domain) ) + { + /* Not necessarily true, but for non-translated guests, we claim + * it''s the most generic kind of memory */ + *t = p2m_ram_rw; + return _mfn(gfn); + } + + mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); + +#ifdef __x86_64__ + if ( q == p2m_unshare && p2m_is_shared(*t) ) + { + ASSERT(!p2m_is_nestedp2m(p2m)); + mem_sharing_unshare_page(p2m->domain, gfn, 0); + mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); + } +#endif + +#ifdef __x86_64__ + if (unlikely((p2m_is_broken(*t)))) + { + /* Return invalid_mfn to avoid caller''s access */ + mfn = _mfn(INVALID_MFN); + if (q == p2m_guest) + domain_crash(p2m->domain); + } +#endif + + return mfn; +} + int set_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma) { diff -r 0e8fb21ab4ff -r f07d4ebe5248 xen/include/asm-x86/p2m.h --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -313,45 +313,9 @@ struct p2m_domain *p2m_get_p2m(struct vc * If the lookup succeeds, the return value is != INVALID_MFN and * *page_order is filled in with the order of the superpage (if any) that * the entry was found in. */ -static inline mfn_t -gfn_to_mfn_type_p2m(struct p2m_domain *p2m, unsigned long gfn, +mfn_t gfn_to_mfn_type_p2m(struct p2m_domain *p2m, unsigned long gfn, p2m_type_t *t, p2m_access_t *a, p2m_query_t q, - unsigned int *page_order) -{ - mfn_t mfn; - - if ( !p2m || !paging_mode_translate(p2m->domain) ) - { - /* Not necessarily true, but for non-translated guests, we claim - * it''s the most generic kind of memory */ - *t = p2m_ram_rw; - return _mfn(gfn); - } - - mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); - -#ifdef __x86_64__ - if ( q == p2m_unshare && p2m_is_shared(*t) ) - { - ASSERT(!p2m_is_nestedp2m(p2m)); - mem_sharing_unshare_page(p2m->domain, gfn, 0); - mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); - } -#endif - -#ifdef __x86_64__ - if (unlikely((p2m_is_broken(*t)))) - { - /* Return invalid_mfn to avoid caller''s access */ - mfn = _mfn(INVALID_MFN); - if (q == p2m_guest) - domain_crash(p2m->domain); - } -#endif - - return mfn; -} - + unsigned int *page_order); /* General conversion function from gfn to mfn */ static inline mfn_t gfn_to_mfn_type(struct domain *d, _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-08 03:28 UTC
[Xen-devel] [PATCH 5 of 5] Modify naming of queries into the p2m
xen/arch/x86/cpu/mcheck/vmce.c | 9 +- xen/arch/x86/debug.c | 17 ++- xen/arch/x86/domain.c | 27 +++++- xen/arch/x86/domctl.c | 15 ++- xen/arch/x86/hvm/emulate.c | 29 ++++++- xen/arch/x86/hvm/hvm.c | 133 +++++++++++++++++++++++++------ xen/arch/x86/hvm/mtrr.c | 2 +- xen/arch/x86/hvm/nestedhvm.c | 2 +- xen/arch/x86/hvm/stdvga.c | 4 +- xen/arch/x86/hvm/svm/nestedsvm.c | 12 +- xen/arch/x86/hvm/svm/svm.c | 11 +- xen/arch/x86/hvm/viridian.c | 8 +- xen/arch/x86/hvm/vmx/vmx.c | 15 ++- xen/arch/x86/hvm/vmx/vvmx.c | 13 ++- xen/arch/x86/mm.c | 153 +++++++++++++++++++++++++++++------- xen/arch/x86/mm/guest_walk.c | 30 +++++- xen/arch/x86/mm/hap/guest_walk.c | 16 ++- xen/arch/x86/mm/hap/nested_hap.c | 15 ++- xen/arch/x86/mm/mem_event.c | 23 ++++- xen/arch/x86/mm/mem_sharing.c | 27 +++++- xen/arch/x86/mm/p2m-pod.c | 19 ++- xen/arch/x86/mm/p2m-pt.c | 6 +- xen/arch/x86/mm/p2m.c | 34 ++++--- xen/arch/x86/mm/shadow/common.c | 6 +- xen/arch/x86/mm/shadow/multi.c | 85 ++++++++++++++----- xen/arch/x86/mm/shadow/types.h | 10 +- xen/arch/x86/physdev.c | 8 +- xen/arch/x86/traps.c | 19 +++- xen/common/grant_table.c | 30 +++++- xen/common/memory.c | 13 ++- xen/common/tmem_xen.c | 21 +++- xen/include/asm-ia64/mm.h | 2 + xen/include/asm-x86/guest_pt.h | 6 +- xen/include/asm-x86/hvm/hvm.h | 5 +- xen/include/asm-x86/hvm/vmx/vvmx.h | 1 + xen/include/asm-x86/p2m.h | 38 ++++++-- 36 files changed, 647 insertions(+), 217 deletions(-) Callers of lookups into the p2m code are now variants of get_gfn. All callers need to call put_gfn. The code behind it is a no-op at the moment, but will change to proper locking in a later patch. This patch does not change functionality. Only naming, and adds put_gfn''s. set_p2m_entry retains its name because it is always called with p2m_lock held. This patch is humongous, unfortunately, given the dozens of call sites involved. After this patch, anyone using old style gfn_to_mfn will not succeed in compiling their code. This is on purpose: adapt to the new API. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/cpu/mcheck/vmce.c --- a/xen/arch/x86/cpu/mcheck/vmce.c +++ b/xen/arch/x86/cpu/mcheck/vmce.c @@ -574,6 +574,7 @@ int unmmap_broken_page(struct domain *d, { mfn_t r_mfn; p2m_type_t pt; + int rc; /* Always trust dom0''s MCE handler will prevent future access */ if ( d == dom0 ) @@ -585,14 +586,16 @@ int unmmap_broken_page(struct domain *d, if ( !is_hvm_domain(d) || !paging_mode_hap(d) ) return -ENOSYS; - r_mfn = gfn_to_mfn_query(d, gfn, &pt); + rc = -1; + r_mfn = get_gfn_query(d, gfn, &pt); if ( p2m_to_mask(pt) & P2M_UNMAP_TYPES) { ASSERT(mfn_x(r_mfn) == mfn_x(mfn)); p2m_change_type(d, gfn, pt, p2m_ram_broken); - return 0; + rc = 0; } + put_gfn(d, gfn); - return -1; + return rc; } diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/debug.c --- a/xen/arch/x86/debug.c +++ b/xen/arch/x86/debug.c @@ -43,22 +43,23 @@ /* Returns: mfn for the given (hvm guest) vaddr */ static unsigned long -dbg_hvm_va2mfn(dbgva_t vaddr, struct domain *dp, int toaddr) +dbg_hvm_va2mfn(dbgva_t vaddr, struct domain *dp, int toaddr, + unsigned long *gfn) { - unsigned long mfn, gfn; + unsigned long mfn; uint32_t pfec = PFEC_page_present; p2m_type_t gfntype; DBGP2("vaddr:%lx domid:%d\n", vaddr, dp->domain_id); - gfn = paging_gva_to_gfn(dp->vcpu[0], vaddr, &pfec); - if ( gfn == INVALID_GFN ) + *gfn = paging_gva_to_gfn(dp->vcpu[0], vaddr, &pfec); + if ( *gfn == INVALID_GFN ) { DBGP2("kdb:bad gfn from gva_to_gfn\n"); return INVALID_MFN; } - mfn = mfn_x(gfn_to_mfn(dp, gfn, &gfntype)); + mfn = mfn_x(get_gfn(dp, *gfn, &gfntype)); if ( p2m_is_readonly(gfntype) && toaddr ) { DBGP2("kdb:p2m_is_readonly: gfntype:%x\n", gfntype); @@ -193,12 +194,12 @@ dbg_rw_guest_mem(dbgva_t addr, dbgbyte_t while ( len > 0 ) { char *va; - unsigned long mfn, pagecnt; + unsigned long mfn, gfn = INVALID_GFN, pagecnt; pagecnt = min_t(long, PAGE_SIZE - (addr & ~PAGE_MASK), len); mfn = (dp->is_hvm - ? dbg_hvm_va2mfn(addr, dp, toaddr) + ? dbg_hvm_va2mfn(addr, dp, toaddr, &gfn) : dbg_pv_va2mfn(addr, dp, pgd3)); if ( mfn == INVALID_MFN ) break; @@ -217,6 +218,8 @@ dbg_rw_guest_mem(dbgva_t addr, dbgbyte_t } unmap_domain_page(va); + if ( gfn != INVALID_GFN ) + put_gfn(dp, gfn); addr += pagecnt; buf += pagecnt; diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/domain.c --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -720,6 +720,7 @@ int arch_set_info_guest( struct vcpu *v, vcpu_guest_context_u c) { struct domain *d = v->domain; + unsigned long cr3_gfn; unsigned long cr3_pfn = INVALID_MFN; unsigned long flags, cr4; unsigned int i; @@ -931,7 +932,8 @@ int arch_set_info_guest( if ( !compat ) { - cr3_pfn = gmfn_to_mfn(d, xen_cr3_to_pfn(c.nat->ctrlreg[3])); + cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[3]); + cr3_pfn = get_gfn_untyped(d, cr3_gfn); if ( !mfn_valid(cr3_pfn) || (paging_mode_refcounts(d) @@ -939,16 +941,18 @@ int arch_set_info_guest( : !get_page_and_type(mfn_to_page(cr3_pfn), d, PGT_base_page_table)) ) { + put_gfn(d, cr3_gfn); destroy_gdt(v); return -EINVAL; } v->arch.guest_table = pagetable_from_pfn(cr3_pfn); - + put_gfn(d, cr3_gfn); #ifdef __x86_64__ if ( c.nat->ctrlreg[1] ) { - cr3_pfn = gmfn_to_mfn(d, xen_cr3_to_pfn(c.nat->ctrlreg[1])); + cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[1]); + cr3_pfn = get_gfn_untyped(d, cr3_gfn); if ( !mfn_valid(cr3_pfn) || (paging_mode_refcounts(d) @@ -962,11 +966,13 @@ int arch_set_info_guest( put_page(mfn_to_page(cr3_pfn)); else put_page_and_type(mfn_to_page(cr3_pfn)); + put_gfn(d, cr3_gfn); destroy_gdt(v); return -EINVAL; } v->arch.guest_table_user = pagetable_from_pfn(cr3_pfn); + put_gfn(d, cr3_gfn); } else if ( !(flags & VGCF_in_kernel) ) { @@ -978,7 +984,8 @@ int arch_set_info_guest( { l4_pgentry_t *l4tab; - cr3_pfn = gmfn_to_mfn(d, compat_cr3_to_pfn(c.cmp->ctrlreg[3])); + cr3_gfn = compat_cr3_to_pfn(c.cmp->ctrlreg[3]); + cr3_pfn = get_gfn_untyped(d, cr3_gfn); if ( !mfn_valid(cr3_pfn) || (paging_mode_refcounts(d) @@ -986,6 +993,7 @@ int arch_set_info_guest( : !get_page_and_type(mfn_to_page(cr3_pfn), d, PGT_l3_page_table)) ) { + put_gfn(d, cr3_gfn); destroy_gdt(v); return -EINVAL; } @@ -993,6 +1001,7 @@ int arch_set_info_guest( l4tab = __va(pagetable_get_paddr(v->arch.guest_table)); *l4tab = l4e_from_pfn( cr3_pfn, _PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED); + put_gfn(d, cr3_gfn); #endif } @@ -1058,11 +1067,12 @@ unmap_vcpu_info(struct vcpu *v) * event doesn''t get missed. */ static int -map_vcpu_info(struct vcpu *v, unsigned long mfn, unsigned offset) +map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset) { struct domain *d = v->domain; void *mapping; vcpu_info_t *new_info; + unsigned long mfn; int i; if ( offset > (PAGE_SIZE - sizeof(vcpu_info_t)) ) @@ -1075,15 +1085,19 @@ map_vcpu_info(struct vcpu *v, unsigned l if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) return -EINVAL; - mfn = gmfn_to_mfn(d, mfn); + mfn = get_gfn_untyped(d, gfn); if ( !mfn_valid(mfn) || !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) ) + { + put_gfn(d, gfn); return -EINVAL; + } mapping = map_domain_page_global(mfn); if ( mapping == NULL ) { put_page_and_type(mfn_to_page(mfn)); + put_gfn(d, gfn); return -ENOMEM; } @@ -1113,6 +1127,7 @@ map_vcpu_info(struct vcpu *v, unsigned l for ( i = 0; i < BITS_PER_EVTCHN_WORD(d); i++ ) set_bit(i, &vcpu_info(v, evtchn_pending_sel)); + put_gfn(d, gfn); return 0; } diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/domctl.c --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -199,7 +199,7 @@ long arch_do_domctl( for ( j = 0; j < k; j++ ) { - unsigned long type = 0, mfn = gmfn_to_mfn(d, arr[j]); + unsigned long type = 0, mfn = get_gfn_untyped(d, arr[j]); page = mfn_to_page(mfn); @@ -235,6 +235,7 @@ long arch_do_domctl( type = XEN_DOMCTL_PFINFO_XTAB; arr[j] = type; + put_gfn(d, arr[j]); } if ( copy_to_guest_offset(domctl->u.getpageframeinfo3.array, @@ -299,7 +300,8 @@ long arch_do_domctl( for ( j = 0; j < k; j++ ) { struct page_info *page; - unsigned long mfn = gmfn_to_mfn(d, arr32[j]); + unsigned long gfn = arr32[j]; + unsigned long mfn = get_gfn_untyped(d, gfn); page = mfn_to_page(mfn); @@ -310,8 +312,10 @@ long arch_do_domctl( unlikely(is_xen_heap_mfn(mfn)) ) arr32[j] |= XEN_DOMCTL_PFINFO_XTAB; else if ( xsm_getpageframeinfo(page) != 0 ) + { + put_gfn(d, gfn); continue; - else if ( likely(get_page(page, d)) ) + } else if ( likely(get_page(page, d)) ) { unsigned long type = 0; @@ -339,6 +343,7 @@ long arch_do_domctl( else arr32[j] |= XEN_DOMCTL_PFINFO_XTAB; + put_gfn(d, gfn); } if ( copy_to_guest_offset(domctl->u.getpageframeinfo2.array, @@ -425,12 +430,13 @@ long arch_do_domctl( break; } - mfn = gmfn_to_mfn(d, gmfn); + mfn = get_gfn_untyped(d, gmfn); ret = -EACCES; if ( !mfn_valid(mfn) || !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) ) { + put_gfn(d, gmfn); rcu_unlock_domain(d); break; } @@ -443,6 +449,7 @@ long arch_do_domctl( put_page_and_type(mfn_to_page(mfn)); + put_gfn(d, gmfn); rcu_unlock_domain(d); } break; diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/emulate.c --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -63,14 +63,18 @@ static int hvmemul_do_io( int rc; /* Check for paged out page */ - ram_mfn = gfn_to_mfn_unshare(curr->domain, ram_gfn, &p2mt); + ram_mfn = get_gfn_unshare(curr->domain, ram_gfn, &p2mt); if ( p2m_is_paging(p2mt) ) { p2m_mem_paging_populate(curr->domain, ram_gfn); + put_gfn(curr->domain, ram_gfn); return X86EMUL_RETRY; } if ( p2m_is_shared(p2mt) ) + { + put_gfn(curr->domain, ram_gfn); return X86EMUL_RETRY; + } /* * Weird-sized accesses have undefined behaviour: we discard writes @@ -82,6 +86,7 @@ static int hvmemul_do_io( ASSERT(p_data != NULL); /* cannot happen with a REP prefix */ if ( dir == IOREQ_READ ) memset(p_data, ~0, size); + put_gfn(curr->domain, ram_gfn); return X86EMUL_UNHANDLEABLE; } @@ -101,7 +106,10 @@ static int hvmemul_do_io( paddr_t pa = vio->mmio_large_write_pa; unsigned int bytes = vio->mmio_large_write_bytes; if ( (addr >= pa) && ((addr + size) <= (pa + bytes)) ) + { + put_gfn(curr->domain, ram_gfn); return X86EMUL_OKAY; + } } else { @@ -111,6 +119,7 @@ static int hvmemul_do_io( { memcpy(p_data, &vio->mmio_large_read[addr - pa], size); + put_gfn(curr->domain, ram_gfn); return X86EMUL_OKAY; } } @@ -123,15 +132,22 @@ static int hvmemul_do_io( case HVMIO_completed: vio->io_state = HVMIO_none; if ( p_data == NULL ) + { + put_gfn(curr->domain, ram_gfn); return X86EMUL_UNHANDLEABLE; + } goto finish_access; case HVMIO_dispatched: /* May have to wait for previous cycle of a multi-write to complete. */ if ( is_mmio && !value_is_ptr && (dir == IOREQ_WRITE) && (addr == (vio->mmio_large_write_pa + vio->mmio_large_write_bytes)) ) + { + put_gfn(curr->domain, ram_gfn); return X86EMUL_RETRY; + } default: + put_gfn(curr->domain, ram_gfn); return X86EMUL_UNHANDLEABLE; } @@ -139,6 +155,7 @@ static int hvmemul_do_io( { gdprintk(XENLOG_WARNING, "WARNING: io already pending (%d)?\n", p->state); + put_gfn(curr->domain, ram_gfn); return X86EMUL_UNHANDLEABLE; } @@ -189,7 +206,10 @@ static int hvmemul_do_io( } if ( rc != X86EMUL_OKAY ) + { + put_gfn(curr->domain, ram_gfn); return rc; + } finish_access: if ( p_data != NULL ) @@ -223,6 +243,7 @@ static int hvmemul_do_io( } } + put_gfn(curr->domain, ram_gfn); return X86EMUL_OKAY; } @@ -671,12 +692,14 @@ static int hvmemul_rep_movs( if ( rc != X86EMUL_OKAY ) return rc; - (void)gfn_to_mfn(current->domain, sgpa >> PAGE_SHIFT, &p2mt); + /* Unlocked works here because we get_gfn for real in whatever + * we call later. */ + (void)get_gfn_unlocked(current->domain, sgpa >> PAGE_SHIFT, &p2mt); if ( !p2m_is_ram(p2mt) && !p2m_is_grant(p2mt) ) return hvmemul_do_mmio( sgpa, reps, bytes_per_rep, dgpa, IOREQ_READ, df, NULL); - (void)gfn_to_mfn(current->domain, dgpa >> PAGE_SHIFT, &p2mt); + (void)get_gfn_unlocked(current->domain, dgpa >> PAGE_SHIFT, &p2mt); if ( !p2m_is_ram(p2mt) && !p2m_is_grant(p2mt) ) return hvmemul_do_mmio( dgpa, reps, bytes_per_rep, sgpa, IOREQ_WRITE, df, NULL); diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -355,26 +355,37 @@ static int hvm_set_ioreq_page( unsigned long mfn; void *va; - mfn = mfn_x(gfn_to_mfn_unshare(d, gmfn, &p2mt)); + mfn = mfn_x(get_gfn_unshare(d, gmfn, &p2mt)); if ( !p2m_is_ram(p2mt) ) + { + put_gfn(d, gmfn); return -EINVAL; + } if ( p2m_is_paging(p2mt) ) { p2m_mem_paging_populate(d, gmfn); + put_gfn(d, gmfn); return -ENOENT; } if ( p2m_is_shared(p2mt) ) + { + put_gfn(d, gmfn); return -ENOENT; + } ASSERT(mfn_valid(mfn)); page = mfn_to_page(mfn); if ( !get_page_and_type(page, d, PGT_writable_page) ) + { + put_gfn(d, gmfn); return -EINVAL; + } va = map_domain_page_global(mfn); if ( va == NULL ) { put_page_and_type(page); + put_gfn(d, gmfn); return -ENOMEM; } @@ -385,6 +396,7 @@ static int hvm_set_ioreq_page( spin_unlock(&iorp->lock); unmap_domain_page_global(va); put_page_and_type(mfn_to_page(mfn)); + put_gfn(d, gmfn); return -EINVAL; } @@ -392,6 +404,7 @@ static int hvm_set_ioreq_page( iorp->page = page; spin_unlock(&iorp->lock); + put_gfn(d, gmfn); domain_unpause(d); @@ -1182,6 +1195,7 @@ int hvm_hap_nested_page_fault(unsigned l mfn_t mfn; struct vcpu *v = current; struct p2m_domain *p2m; + int rc; /* On Nested Virtualization, walk the guest page table. * If this succeeds, all is fine. @@ -1255,8 +1269,8 @@ int hvm_hap_nested_page_fault(unsigned l if ( violation ) { p2m_mem_access_check(gpa, gla_valid, gla, access_r, access_w, access_x); - - return 1; + rc = 1; + goto out_put_gfn; } } @@ -1268,7 +1282,8 @@ int hvm_hap_nested_page_fault(unsigned l { if ( !handle_mmio() ) hvm_inject_exception(TRAP_gp_fault, 0, 0); - return 1; + rc = 1; + goto out_put_gfn; } #ifdef __x86_64__ @@ -1281,7 +1296,8 @@ int hvm_hap_nested_page_fault(unsigned l { ASSERT(!p2m_is_nestedp2m(p2m)); mem_sharing_unshare_page(p2m->domain, gfn, 0); - return 1; + rc = 1; + goto out_put_gfn; } #endif @@ -1295,7 +1311,8 @@ int hvm_hap_nested_page_fault(unsigned l */ paging_mark_dirty(v->domain, mfn_x(mfn)); p2m_change_type(v->domain, gfn, p2m_ram_logdirty, p2m_ram_rw); - return 1; + rc = 1; + goto out_put_gfn; } /* Shouldn''t happen: Maybe the guest was writing to a r/o grant mapping? */ @@ -1304,10 +1321,14 @@ int hvm_hap_nested_page_fault(unsigned l gdprintk(XENLOG_WARNING, "trying to write to read-only grant mapping\n"); hvm_inject_exception(TRAP_gp_fault, 0, 0); - return 1; + rc = 1; + goto out_put_gfn; } - return 0; + rc = 0; +out_put_gfn: + put_gfn(p2m->domain, gfn); + return rc; } int hvm_handle_xsetbv(u64 new_bv) @@ -1530,10 +1551,11 @@ int hvm_set_cr0(unsigned long value) { /* The guest CR3 must be pointing to the guest physical. */ gfn = v->arch.hvm_vcpu.guest_cr[3]>>PAGE_SHIFT; - mfn = mfn_x(gfn_to_mfn(v->domain, gfn, &p2mt)); + mfn = mfn_x(get_gfn(v->domain, gfn, &p2mt)); if ( !p2m_is_ram(p2mt) || !mfn_valid(mfn) || !get_page(mfn_to_page(mfn), v->domain)) { + put_gfn(v->domain, gfn); gdprintk(XENLOG_ERR, "Invalid CR3 value = %lx (mfn=%lx)\n", v->arch.hvm_vcpu.guest_cr[3], mfn); domain_crash(v->domain); @@ -1545,6 +1567,7 @@ int hvm_set_cr0(unsigned long value) HVM_DBG_LOG(DBG_LEVEL_VMMU, "Update CR3 value = %lx, mfn = %lx", v->arch.hvm_vcpu.guest_cr[3], mfn); + put_gfn(v->domain, gfn); } } else if ( !(value & X86_CR0_PG) && (old_value & X86_CR0_PG) ) @@ -1621,13 +1644,17 @@ int hvm_set_cr3(unsigned long value) { /* Shadow-mode CR3 change. Check PDBR and update refcounts. */ HVM_DBG_LOG(DBG_LEVEL_VMMU, "CR3 value = %lx", value); - mfn = mfn_x(gfn_to_mfn(v->domain, value >> PAGE_SHIFT, &p2mt)); + mfn = mfn_x(get_gfn(v->domain, value >> PAGE_SHIFT, &p2mt)); if ( !p2m_is_ram(p2mt) || !mfn_valid(mfn) || !get_page(mfn_to_page(mfn), v->domain) ) + { + put_gfn(v->domain, value >> PAGE_SHIFT); goto bad_cr3; + } put_page(pagetable_get_page(v->arch.guest_table)); v->arch.guest_table = pagetable_from_pfn(mfn); + put_gfn(v->domain, value >> PAGE_SHIFT); HVM_DBG_LOG(DBG_LEVEL_VMMU, "Update CR3 value = %lx", value); } @@ -1764,6 +1791,8 @@ int hvm_virtual_to_linear_addr( return 0; } +/* We leave this function holding a lock on the p2m entry and a ref + * on the mapped mfn */ static void *__hvm_map_guest_frame(unsigned long gfn, bool_t writable) { unsigned long mfn; @@ -1771,13 +1800,17 @@ static void *__hvm_map_guest_frame(unsig struct domain *d = current->domain; mfn = mfn_x(writable - ? gfn_to_mfn_unshare(d, gfn, &p2mt) - : gfn_to_mfn(d, gfn, &p2mt)); + ? get_gfn_unshare(d, gfn, &p2mt) + : get_gfn(d, gfn, &p2mt)); if ( (p2m_is_shared(p2mt) && writable) || !p2m_is_ram(p2mt) ) + { + put_gfn(d, gfn); return NULL; + } if ( p2m_is_paging(p2mt) ) { p2m_mem_paging_populate(d, gfn); + put_gfn(d, gfn); return NULL; } @@ -1799,10 +1832,33 @@ void *hvm_map_guest_frame_ro(unsigned lo return __hvm_map_guest_frame(gfn, 0); } -void hvm_unmap_guest_frame(void *p) +void hvm_unmap_guest_frame(void *p, unsigned long addr, int is_va) { + /* We enter this function with a map obtained in __hvm_map_guest_frame. + * This map performed a p2m query that locked the gfn entry and got + * a ref on the mfn. Must undo */ if ( p ) + { + unsigned long gfn = INVALID_GFN; + + if ( is_va ) + { + if ( addr ) + { + uint32_t pfec = PFEC_page_present; + gfn = paging_gva_to_gfn(current, addr, &pfec); + } else { + gfn = INVALID_GFN; + } + } else { + gfn = addr; + } + + if ( gfn != INVALID_GFN ) + put_gfn(current->domain, gfn); + unmap_domain_page(p); + } } static void *hvm_map_entry(unsigned long va) @@ -1839,9 +1895,9 @@ static void *hvm_map_entry(unsigned long return NULL; } -static void hvm_unmap_entry(void *p) +static void hvm_unmap_entry(void *p, unsigned long va) { - hvm_unmap_guest_frame(p); + hvm_unmap_guest_frame(p, va, 1); } static int hvm_load_segment_selector( @@ -1853,6 +1909,7 @@ static int hvm_load_segment_selector( int fault_type = TRAP_invalid_tss; struct cpu_user_regs *regs = guest_cpu_user_regs(); struct vcpu *v = current; + unsigned long va_desc; if ( regs->eflags & X86_EFLAGS_VM ) { @@ -1886,7 +1943,8 @@ static int hvm_load_segment_selector( if ( ((sel & 0xfff8) + 7) > desctab.limit ) goto fail; - pdesc = hvm_map_entry(desctab.base + (sel & 0xfff8)); + va_desc = desctab.base + (sel & 0xfff8); + pdesc = hvm_map_entry(va_desc); if ( pdesc == NULL ) goto hvm_map_fail; @@ -1946,7 +2004,7 @@ static int hvm_load_segment_selector( desc.b |= 0x100; skip_accessed_flag: - hvm_unmap_entry(pdesc); + hvm_unmap_entry(pdesc, va_desc); segr.base = (((desc.b << 0) & 0xff000000u) | ((desc.b << 16) & 0x00ff0000u) | @@ -1962,7 +2020,7 @@ static int hvm_load_segment_selector( return 0; unmap_and_fail: - hvm_unmap_entry(pdesc); + hvm_unmap_entry(pdesc, va_desc); fail: hvm_inject_exception(fault_type, sel & 0xfffc, 0); hvm_map_fail: @@ -1977,7 +2035,7 @@ void hvm_task_switch( struct cpu_user_regs *regs = guest_cpu_user_regs(); struct segment_register gdt, tr, prev_tr, segr; struct desc_struct *optss_desc = NULL, *nptss_desc = NULL, tss_desc; - unsigned long eflags; + unsigned long eflags, va_optss = 0, va_nptss = 0; int exn_raised, rc; struct { u16 back_link,__blh; @@ -2003,11 +2061,13 @@ void hvm_task_switch( goto out; } - optss_desc = hvm_map_entry(gdt.base + (prev_tr.sel & 0xfff8)); + va_optss = gdt.base + (prev_tr.sel & 0xfff8); + optss_desc = hvm_map_entry(va_optss); if ( optss_desc == NULL ) goto out; - nptss_desc = hvm_map_entry(gdt.base + (tss_sel & 0xfff8)); + va_nptss = gdt.base + (tss_sel & 0xfff8); + nptss_desc = hvm_map_entry(va_nptss); if ( nptss_desc == NULL ) goto out; @@ -2172,8 +2232,8 @@ void hvm_task_switch( } out: - hvm_unmap_entry(optss_desc); - hvm_unmap_entry(nptss_desc); + hvm_unmap_entry(optss_desc, va_optss); + hvm_unmap_entry(nptss_desc, va_nptss); } #define HVMCOPY_from_guest (0u<<0) @@ -2230,19 +2290,29 @@ static enum hvm_copy_result __hvm_copy( gfn = addr >> PAGE_SHIFT; } - mfn = mfn_x(gfn_to_mfn_unshare(curr->domain, gfn, &p2mt)); + mfn = mfn_x(get_gfn_unshare(curr->domain, gfn, &p2mt)); if ( p2m_is_paging(p2mt) ) { p2m_mem_paging_populate(curr->domain, gfn); + put_gfn(curr->domain, gfn); return HVMCOPY_gfn_paged_out; } if ( p2m_is_shared(p2mt) ) + { + put_gfn(curr->domain, gfn); return HVMCOPY_gfn_shared; + } if ( p2m_is_grant(p2mt) ) + { + put_gfn(curr->domain, gfn); return HVMCOPY_unhandleable; + } if ( !p2m_is_ram(p2mt) ) + { + put_gfn(curr->domain, gfn); return HVMCOPY_bad_gfn_to_mfn; + } ASSERT(mfn_valid(mfn)); p = (char *)map_domain_page(mfn) + (addr & ~PAGE_MASK); @@ -2273,6 +2343,7 @@ static enum hvm_copy_result __hvm_copy( addr += count; buf += count; todo -= count; + put_gfn(curr->domain, gfn); } return HVMCOPY_okay; @@ -3690,11 +3761,11 @@ long do_hvm_op(unsigned long op, XEN_GUE for ( pfn = a.first_pfn; pfn < a.first_pfn + a.nr; pfn++ ) { p2m_type_t t; - mfn_t mfn = gfn_to_mfn(d, pfn, &t); + mfn_t mfn = get_gfn(d, pfn, &t); if ( p2m_is_paging(t) ) { p2m_mem_paging_populate(d, pfn); - + put_gfn(d, pfn); rc = -EINVAL; goto param_fail3; } @@ -3709,6 +3780,7 @@ long do_hvm_op(unsigned long op, XEN_GUE /* don''t take a long time and don''t die either */ sh_remove_shadows(d->vcpu[0], mfn, 1, 0); } + put_gfn(d, pfn); } param_fail3: @@ -3732,7 +3804,7 @@ long do_hvm_op(unsigned long op, XEN_GUE rc = -EINVAL; if ( is_hvm_domain(d) ) { - gfn_to_mfn_unshare(d, a.pfn, &t); + get_gfn_unshare_unlocked(d, a.pfn, &t); if ( p2m_is_mmio(t) ) a.mem_type = HVMMEM_mmio_dm; else if ( p2m_is_readonly(t) ) @@ -3785,20 +3857,23 @@ long do_hvm_op(unsigned long op, XEN_GUE p2m_type_t t; p2m_type_t nt; mfn_t mfn; - mfn = gfn_to_mfn_unshare(d, pfn, &t); + mfn = get_gfn_unshare(d, pfn, &t); if ( p2m_is_paging(t) ) { p2m_mem_paging_populate(d, pfn); + put_gfn(d, pfn); rc = -EINVAL; goto param_fail4; } if ( p2m_is_shared(t) ) { + put_gfn(d, pfn); rc = -EINVAL; goto param_fail4; } if ( p2m_is_grant(t) ) { + put_gfn(d, pfn); gdprintk(XENLOG_WARNING, "type for pfn 0x%lx changed to grant while " "we were working?\n", pfn); @@ -3809,6 +3884,7 @@ long do_hvm_op(unsigned long op, XEN_GUE nt = p2m_change_type(d, pfn, t, memtype[a.hvmmem_type]); if ( nt != t ) { + put_gfn(d, pfn); gdprintk(XENLOG_WARNING, "type of pfn 0x%lx changed from %d to %d while " "we were trying to change it to %d\n", @@ -3816,6 +3892,7 @@ long do_hvm_op(unsigned long op, XEN_GUE goto param_fail4; } } + put_gfn(d, pfn); } rc = 0; diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/mtrr.c --- a/xen/arch/x86/hvm/mtrr.c +++ b/xen/arch/x86/hvm/mtrr.c @@ -389,7 +389,7 @@ uint32_t get_pat_flags(struct vcpu *v, { struct domain *d = v->domain; p2m_type_t p2mt; - gfn_to_mfn_query(d, paddr_to_pfn(gpaddr), &p2mt); + get_gfn_query_unlocked(d, paddr_to_pfn(gpaddr), &p2mt); if (p2m_is_ram(p2mt)) gdprintk(XENLOG_WARNING, "Conflict occurs for a given guest l1e flags:%x " diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/nestedhvm.c --- a/xen/arch/x86/hvm/nestedhvm.c +++ b/xen/arch/x86/hvm/nestedhvm.c @@ -56,7 +56,7 @@ nestedhvm_vcpu_reset(struct vcpu *v) nv->nv_ioportED = 0; if (nv->nv_vvmcx) - hvm_unmap_guest_frame(nv->nv_vvmcx); + hvm_unmap_guest_frame(nv->nv_vvmcx, nv->nv_vvmcxaddr >> PAGE_SHIFT, 0); nv->nv_vvmcx = NULL; nv->nv_vvmcxaddr = VMCX_EADDR; nv->nv_flushp2m = 0; diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/stdvga.c --- a/xen/arch/x86/hvm/stdvga.c +++ b/xen/arch/x86/hvm/stdvga.c @@ -482,7 +482,7 @@ static int mmio_move(struct hvm_hw_stdvg if ( hvm_copy_to_guest_phys(data, &tmp, p->size) ! HVMCOPY_okay ) { - (void)gfn_to_mfn(d, data >> PAGE_SHIFT, &p2mt); + (void)get_gfn_unlocked(d, data >> PAGE_SHIFT, &p2mt); /* * The only case we handle is vga_mem <-> vga_mem. * Anything else disables caching and leaves it to qemu-dm. @@ -504,7 +504,7 @@ static int mmio_move(struct hvm_hw_stdvg if ( hvm_copy_from_guest_phys(&tmp, data, p->size) ! HVMCOPY_okay ) { - (void)gfn_to_mfn(d, data >> PAGE_SHIFT, &p2mt); + (void)get_gfn_unlocked(d, data >> PAGE_SHIFT, &p2mt); if ( (p2mt != p2m_mmio_dm) || (data < VGA_MEM_BASE) || ((data + p->size) > (VGA_MEM_BASE + VGA_MEM_SIZE)) ) return 0; diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/svm/nestedsvm.c --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -71,7 +71,7 @@ int nestedsvm_vmcb_map(struct vcpu *v, u if (nv->nv_vvmcx != NULL && nv->nv_vvmcxaddr != vmcbaddr) { ASSERT(nv->nv_vvmcx != NULL); ASSERT(nv->nv_vvmcxaddr != VMCX_EADDR); - hvm_unmap_guest_frame(nv->nv_vvmcx); + hvm_unmap_guest_frame(nv->nv_vvmcx, nv->nv_vvmcxaddr >> PAGE_SHIFT, 0); nv->nv_vvmcx = NULL; nv->nv_vvmcxaddr = VMCX_EADDR; } @@ -353,7 +353,7 @@ static int nsvm_vmrun_permissionmap(stru ASSERT(ns_viomap != NULL); ioport_80 = test_bit(0x80, ns_viomap); ioport_ed = test_bit(0xed, ns_viomap); - hvm_unmap_guest_frame(ns_viomap); + hvm_unmap_guest_frame(ns_viomap, svm->ns_iomap_pa >> PAGE_SHIFT, 0); svm->ns_iomap = nestedhvm_vcpu_iomap_get(ioport_80, ioport_ed); @@ -857,23 +857,25 @@ nsvm_vmcb_guest_intercepts_ioio(paddr_t ioio_info_t ioinfo; uint16_t port; bool_t enabled; + unsigned long gfn = 0; /* gcc ... */ ioinfo.bytes = exitinfo1; port = ioinfo.fields.port; switch (port) { case 0 ... 32767: /* first 4KB page */ - io_bitmap = hvm_map_guest_frame_ro(iopm_gfn); + gfn = iopm_gfn; break; case 32768 ... 65535: /* second 4KB page */ port -= 32768; - io_bitmap = hvm_map_guest_frame_ro(iopm_gfn+1); + gfn = iopm_gfn + 1; break; default: BUG(); break; } + io_bitmap = hvm_map_guest_frame_ro(gfn); if (io_bitmap == NULL) { gdprintk(XENLOG_ERR, "IOIO intercept: mapping of permission map failed\n"); @@ -881,7 +883,7 @@ nsvm_vmcb_guest_intercepts_ioio(paddr_t } enabled = test_bit(port, io_bitmap); - hvm_unmap_guest_frame(io_bitmap); + hvm_unmap_guest_frame(io_bitmap, gfn, 0); if (!enabled) return NESTEDHVM_VMEXIT_HOST; diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/svm/svm.c --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -244,9 +244,10 @@ static int svm_vmcb_restore(struct vcpu { if ( c->cr0 & X86_CR0_PG ) { - mfn = mfn_x(gfn_to_mfn(v->domain, c->cr3 >> PAGE_SHIFT, &p2mt)); + mfn = mfn_x(get_gfn(v->domain, c->cr3 >> PAGE_SHIFT, &p2mt)); if ( !p2m_is_ram(p2mt) || !get_page(mfn_to_page(mfn), v->domain) ) { + put_gfn(v->domain, c->cr3 >> PAGE_SHIFT); gdprintk(XENLOG_ERR, "Invalid CR3 value=0x%"PRIx64"\n", c->cr3); return -EINVAL; @@ -257,6 +258,8 @@ static int svm_vmcb_restore(struct vcpu put_page(pagetable_get_page(v->arch.guest_table)); v->arch.guest_table = pagetable_from_pfn(mfn); + if ( c->cr0 & X86_CR0_PG ) + put_gfn(v->domain, c->cr3 >> PAGE_SHIFT); } v->arch.hvm_vcpu.guest_cr[0] = c->cr0 | X86_CR0_ET; @@ -1144,7 +1147,6 @@ static void svm_do_nested_pgfault(struct unsigned long gfn = gpa >> PAGE_SHIFT; mfn_t mfn; p2m_type_t p2mt; - p2m_access_t p2ma; struct p2m_domain *p2m = NULL; ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0); @@ -1161,7 +1163,8 @@ static void svm_do_nested_pgfault(struct p2m = p2m_get_p2m(v); _d.gpa = gpa; _d.qualification = 0; - _d.mfn = mfn_x(gfn_to_mfn_type_p2m(p2m, gfn, &_d.p2mt, &p2ma, p2m_query, NULL)); + mfn = get_gfn_query_unlocked(p2m->domain, gfn, &_d.p2mt); + _d.mfn = mfn_x(mfn); __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d); } @@ -1181,7 +1184,7 @@ static void svm_do_nested_pgfault(struct if ( p2m == NULL ) p2m = p2m_get_p2m(v); /* Everything else is an error. */ - mfn = gfn_to_mfn_type_p2m(p2m, gfn, &p2mt, &p2ma, p2m_guest, NULL); + mfn = get_gfn_guest_unlocked(p2m->domain, gfn, &p2mt); gdprintk(XENLOG_ERR, "SVM violation gpa %#"PRIpaddr", mfn %#lx, type %i\n", gpa, mfn_x(mfn), p2mt); diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/viridian.c --- a/xen/arch/x86/hvm/viridian.c +++ b/xen/arch/x86/hvm/viridian.c @@ -134,12 +134,13 @@ void dump_apic_assist(struct vcpu *v) static void enable_hypercall_page(struct domain *d) { unsigned long gmfn = d->arch.hvm_domain.viridian.hypercall_gpa.fields.pfn; - unsigned long mfn = gmfn_to_mfn(d, gmfn); + unsigned long mfn = get_gfn_untyped(d, gmfn); uint8_t *p; if ( !mfn_valid(mfn) || !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) ) { + put_gfn(d, gmfn); gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn, mfn); return; } @@ -162,13 +163,14 @@ static void enable_hypercall_page(struct unmap_domain_page(p); put_page_and_type(mfn_to_page(mfn)); + put_gfn(d, gmfn); } void initialize_apic_assist(struct vcpu *v) { struct domain *d = v->domain; unsigned long gmfn = v->arch.hvm_vcpu.viridian.apic_assist.fields.pfn; - unsigned long mfn = gmfn_to_mfn(d, gmfn); + unsigned long mfn = get_gfn_untyped(d, gmfn); uint8_t *p; /* @@ -184,6 +186,7 @@ void initialize_apic_assist(struct vcpu if ( !mfn_valid(mfn) || !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) ) { + put_gfn(d, gmfn); gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx)\n", gmfn, mfn); return; } @@ -195,6 +198,7 @@ void initialize_apic_assist(struct vcpu unmap_domain_page(p); put_page_and_type(mfn_to_page(mfn)); + put_gfn(d, gmfn); } int wrmsr_viridian_regs(uint32_t idx, uint64_t val) diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/vmx/vmx.c --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -487,9 +487,10 @@ static int vmx_restore_cr0_cr3( { if ( cr0 & X86_CR0_PG ) { - mfn = mfn_x(gfn_to_mfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt)); + mfn = mfn_x(get_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt)); if ( !p2m_is_ram(p2mt) || !get_page(mfn_to_page(mfn), v->domain) ) { + put_gfn(v->domain, cr3 >> PAGE_SHIFT); gdprintk(XENLOG_ERR, "Invalid CR3 value=0x%lx\n", cr3); return -EINVAL; } @@ -499,6 +500,8 @@ static int vmx_restore_cr0_cr3( put_page(pagetable_get_page(v->arch.guest_table)); v->arch.guest_table = pagetable_from_pfn(mfn); + if ( cr0 & X86_CR0_PG ) + put_gfn(v->domain, cr3 >> PAGE_SHIFT); } v->arch.hvm_vcpu.guest_cr[0] = cr0 | X86_CR0_ET; @@ -1007,9 +1010,12 @@ static void vmx_load_pdptrs(struct vcpu if ( cr3 & 0x1fUL ) goto crash; - mfn = mfn_x(gfn_to_mfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt)); + mfn = mfn_x(get_gfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt)); if ( !p2m_is_ram(p2mt) ) + { + put_gfn(v->domain, cr3 >> PAGE_SHIFT); goto crash; + } p = map_domain_page(mfn); @@ -1037,6 +1043,7 @@ static void vmx_load_pdptrs(struct vcpu vmx_vmcs_exit(v); unmap_domain_page(p); + put_gfn(v->domain, cr3 >> PAGE_SHIFT); return; crash: @@ -2088,7 +2095,7 @@ static void ept_handle_violation(unsigne _d.gpa = gpa; _d.qualification = qualification; - _d.mfn = mfn_x(gfn_to_mfn_query(d, gfn, &_d.p2mt)); + _d.mfn = mfn_x(get_gfn_query_unlocked(d, gfn, &_d.p2mt)); __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d); } @@ -2104,7 +2111,7 @@ static void ept_handle_violation(unsigne return; /* Everything else is an error. */ - mfn = gfn_to_mfn_guest(d, gfn, &p2mt); + mfn = get_gfn_guest_unlocked(d, gfn, &p2mt); gdprintk(XENLOG_ERR, "EPT violation %#lx (%c%c%c/%c%c%c), " "gpa %#"PRIpaddr", mfn %#lx, type %i.\n", qualification, diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/vmx/vvmx.c --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -558,8 +558,10 @@ static void __map_io_bitmap(struct vcpu index = vmcs_reg == IO_BITMAP_A ? 0 : 1; if (nvmx->iobitmap[index]) - hvm_unmap_guest_frame (nvmx->iobitmap[index]); + hvm_unmap_guest_frame (nvmx->iobitmap[index], + nvmx->iobitmap_gfn[index], 0); gpa = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, vmcs_reg); + nvmx->iobitmap_gfn[index] = gpa >> PAGE_SHIFT; nvmx->iobitmap[index] = hvm_map_guest_frame_ro (gpa >> PAGE_SHIFT); } @@ -577,13 +579,16 @@ static void nvmx_purge_vvmcs(struct vcpu __clear_current_vvmcs(v); if ( nvcpu->nv_vvmcxaddr != VMCX_EADDR ) - hvm_unmap_guest_frame (nvcpu->nv_vvmcx); + hvm_unmap_guest_frame(nvcpu->nv_vvmcx, + nvcpu->nv_vvmcxaddr >> PAGE_SHIFT, 0); nvcpu->nv_vvmcx == NULL; nvcpu->nv_vvmcxaddr = VMCX_EADDR; for (i=0; i<2; i++) { if ( nvmx->iobitmap[i] ) { - hvm_unmap_guest_frame (nvmx->iobitmap[i]); + hvm_unmap_guest_frame(nvmx->iobitmap[i], + nvmx->iobitmap_gfn[i], 0); nvmx->iobitmap[i] = NULL; + nvmx->iobitmap_gfn[i] = 0; } } } @@ -1198,7 +1203,7 @@ int nvmx_handle_vmclear(struct cpu_user_ vvmcs = hvm_map_guest_frame_rw(gpa >> PAGE_SHIFT); if ( vvmcs ) __set_vvmcs(vvmcs, NVMX_LAUNCH_STATE, 0); - hvm_unmap_guest_frame(vvmcs); + hvm_unmap_guest_frame(vvmcs, gpa >> PAGE_SHIFT, 0); } vmreturn(regs, VMSUCCEED); diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -663,13 +663,19 @@ int map_ldt_shadow_page(unsigned int off return 0; gmfn = l1e_get_pfn(l1e); - mfn = gmfn_to_mfn(d, gmfn); + mfn = get_gfn_untyped(d, gmfn); if ( unlikely(!mfn_valid(mfn)) ) + { + put_gfn(d, gmfn); return 0; + } okay = get_page_and_type(mfn_to_page(mfn), d, PGT_seg_desc_page); if ( unlikely(!okay) ) + { + put_gfn(d, gmfn); return 0; + } nl1e = l1e_from_pfn(mfn, l1e_get_flags(l1e) | _PAGE_RW); @@ -678,6 +684,7 @@ int map_ldt_shadow_page(unsigned int off v->arch.pv_vcpu.shadow_ldt_mapcnt++; spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock); + put_gfn(d, gmfn); return 1; } @@ -1798,7 +1805,6 @@ static int mod_l1_entry(l1_pgentry_t *pl { l1_pgentry_t ol1e; struct domain *pt_dom = pt_vcpu->domain; - unsigned long mfn; p2m_type_t p2mt; int rc = 0; @@ -1815,9 +1821,14 @@ static int mod_l1_entry(l1_pgentry_t *pl if ( l1e_get_flags(nl1e) & _PAGE_PRESENT ) { /* Translate foreign guest addresses. */ - mfn = mfn_x(gfn_to_mfn(pg_dom, l1e_get_pfn(nl1e), &p2mt)); + unsigned long mfn, gfn; + gfn = l1e_get_pfn(nl1e); + mfn = mfn_x(get_gfn(pg_dom, gfn, &p2mt)); if ( !p2m_is_ram(p2mt) || unlikely(mfn == INVALID_MFN) ) + { + put_gfn(pg_dom, gfn); return -EINVAL; + } ASSERT((mfn & ~(PADDR_MASK >> PAGE_SHIFT)) == 0); nl1e = l1e_from_pfn(mfn, l1e_get_flags(nl1e)); @@ -1825,6 +1836,7 @@ static int mod_l1_entry(l1_pgentry_t *pl { MEM_LOG("Bad L1 flags %x", l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)); + put_gfn(pg_dom, gfn); return -EINVAL; } @@ -1835,12 +1847,14 @@ static int mod_l1_entry(l1_pgentry_t *pl if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) ) return 0; + put_gfn(pg_dom, gfn); return -EBUSY; } switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) ) { default: + put_gfn(pg_dom, gfn); return rc; case 0: break; @@ -1856,6 +1870,7 @@ static int mod_l1_entry(l1_pgentry_t *pl ol1e = nl1e; rc = -EBUSY; } + put_gfn(pg_dom, gfn); } else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad)) ) @@ -3023,7 +3038,7 @@ int do_mmuext_op( if ( paging_mode_refcounts(pg_owner) ) break; - mfn = gmfn_to_mfn(pg_owner, op.arg1.mfn); + mfn = get_gfn_untyped(pg_owner, op.arg1.mfn); rc = get_page_and_type_from_pagenr(mfn, type, pg_owner, 0, 1); okay = !rc; if ( unlikely(!okay) ) @@ -3032,6 +3047,7 @@ int do_mmuext_op( rc = -EAGAIN; else if ( rc != -EAGAIN ) MEM_LOG("Error while pinning mfn %lx", mfn); + put_gfn(pg_owner, op.arg1.mfn); break; } @@ -3040,6 +3056,7 @@ int do_mmuext_op( if ( (rc = xsm_memory_pin_page(d, page)) != 0 ) { put_page_and_type(page); + put_gfn(pg_owner, op.arg1.mfn); okay = 0; break; } @@ -3049,6 +3066,7 @@ int do_mmuext_op( { MEM_LOG("Mfn %lx already pinned", mfn); put_page_and_type(page); + put_gfn(pg_owner, op.arg1.mfn); okay = 0; break; } @@ -3067,6 +3085,7 @@ int do_mmuext_op( spin_unlock(&pg_owner->page_alloc_lock); if ( drop_ref ) put_page_and_type(page); + put_gfn(pg_owner, op.arg1.mfn); } break; @@ -3079,9 +3098,10 @@ int do_mmuext_op( if ( paging_mode_refcounts(pg_owner) ) break; - mfn = gmfn_to_mfn(pg_owner, op.arg1.mfn); + mfn = get_gfn_untyped(pg_owner, op.arg1.mfn); if ( unlikely(!(okay = get_page_from_pagenr(mfn, pg_owner))) ) { + put_gfn(pg_owner, op.arg1.mfn); MEM_LOG("Mfn %lx bad domain", mfn); break; } @@ -3092,6 +3112,7 @@ int do_mmuext_op( { okay = 0; put_page(page); + put_gfn(pg_owner, op.arg1.mfn); MEM_LOG("Mfn %lx not pinned", mfn); break; } @@ -3102,18 +3123,20 @@ int do_mmuext_op( /* A page is dirtied when its pin status is cleared. */ paging_mark_dirty(pg_owner, mfn); + put_gfn(pg_owner, op.arg1.mfn); break; } case MMUEXT_NEW_BASEPTR: - okay = new_guest_cr3(gmfn_to_mfn(d, op.arg1.mfn)); + okay = new_guest_cr3(get_gfn_untyped(d, op.arg1.mfn)); + put_gfn(d, op.arg1.mfn); break; #ifdef __x86_64__ case MMUEXT_NEW_USER_BASEPTR: { unsigned long old_mfn, mfn; - mfn = gmfn_to_mfn(d, op.arg1.mfn); + mfn = get_gfn_untyped(d, op.arg1.mfn); if ( mfn != 0 ) { if ( paging_mode_refcounts(d) ) @@ -3123,6 +3146,7 @@ int do_mmuext_op( mfn, PGT_root_page_table, d, 0, 0); if ( unlikely(!okay) ) { + put_gfn(d, op.arg1.mfn); MEM_LOG("Error while installing new mfn %lx", mfn); break; } @@ -3130,6 +3154,7 @@ int do_mmuext_op( old_mfn = pagetable_get_pfn(curr->arch.guest_table_user); curr->arch.guest_table_user = pagetable_from_pfn(mfn); + put_gfn(d, op.arg1.mfn); if ( old_mfn != 0 ) { @@ -3247,11 +3272,12 @@ int do_mmuext_op( unsigned long mfn; unsigned char *ptr; - mfn = gmfn_to_mfn(d, op.arg1.mfn); + mfn = get_gfn_untyped(d, op.arg1.mfn); okay = !get_page_and_type_from_pagenr( mfn, PGT_writable_page, d, 0, 0); if ( unlikely(!okay) ) { + put_gfn(d, op.arg1.mfn); MEM_LOG("Error while clearing mfn %lx", mfn); break; } @@ -3264,6 +3290,7 @@ int do_mmuext_op( fixunmap_domain_page(ptr); put_page_and_type(mfn_to_page(mfn)); + put_gfn(d, op.arg1.mfn); break; } @@ -3273,20 +3300,23 @@ int do_mmuext_op( unsigned char *dst; unsigned long src_mfn, mfn; - src_mfn = gmfn_to_mfn(d, op.arg2.src_mfn); + src_mfn = get_gfn_untyped(d, op.arg2.src_mfn); okay = get_page_from_pagenr(src_mfn, d); if ( unlikely(!okay) ) { + put_gfn(d, op.arg2.src_mfn); MEM_LOG("Error while copying from mfn %lx", src_mfn); break; } - mfn = gmfn_to_mfn(d, op.arg1.mfn); + mfn = get_gfn_untyped(d, op.arg1.mfn); okay = !get_page_and_type_from_pagenr( mfn, PGT_writable_page, d, 0, 0); if ( unlikely(!okay) ) { + put_gfn(d, op.arg1.mfn); put_page(mfn_to_page(src_mfn)); + put_gfn(d, op.arg2.src_mfn); MEM_LOG("Error while copying to mfn %lx", mfn); break; } @@ -3301,7 +3331,9 @@ int do_mmuext_op( unmap_domain_page(src); put_page_and_type(mfn_to_page(mfn)); + put_gfn(d, op.arg1.mfn); put_page(mfn_to_page(src_mfn)); + put_gfn(d, op.arg2.src_mfn); break; } @@ -3489,14 +3521,14 @@ int do_mmu_update( req.ptr -= cmd; gmfn = req.ptr >> PAGE_SHIFT; - mfn = mfn_x(gfn_to_mfn(pt_owner, gmfn, &p2mt)); + mfn = mfn_x(get_gfn(pt_owner, gmfn, &p2mt)); if ( !p2m_is_valid(p2mt) ) - mfn = INVALID_MFN; + mfn = INVALID_MFN; if ( p2m_is_paged(p2mt) ) { p2m_mem_paging_populate(pg_owner, gmfn); - + put_gfn(pt_owner, gmfn); rc = -ENOENT; break; } @@ -3504,6 +3536,7 @@ int do_mmu_update( if ( unlikely(!get_page_from_pagenr(mfn, pt_owner)) ) { MEM_LOG("Could not get page for normal update"); + put_gfn(pt_owner, gmfn); break; } @@ -3516,6 +3549,7 @@ int do_mmu_update( if ( rc ) { unmap_domain_page_with_cache(va, &mapcache); put_page(page); + put_gfn(pt_owner, gmfn); break; } @@ -3527,16 +3561,20 @@ int do_mmu_update( { l1_pgentry_t l1e = l1e_from_intpte(req.val); p2m_type_t l1e_p2mt; - gfn_to_mfn(pg_owner, l1e_get_pfn(l1e), &l1e_p2mt); + unsigned long l1egfn = l1e_get_pfn(l1e), l1emfn; + + l1emfn = mfn_x(get_gfn(pg_owner, l1egfn, &l1e_p2mt)); if ( p2m_is_paged(l1e_p2mt) ) { p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e)); + put_gfn(pg_owner, l1egfn); rc = -ENOENT; break; } else if ( p2m_ram_paging_in_start == l1e_p2mt && !mfn_valid(mfn) ) { + put_gfn(pg_owner, l1egfn); rc = -ENOENT; break; } @@ -3553,7 +3591,10 @@ int do_mmu_update( l1e_get_pfn(l1e), 0); if ( rc ) + { + put_gfn(pg_owner, l1egfn); break; + } } } #endif @@ -3561,27 +3602,33 @@ int do_mmu_update( rc = mod_l1_entry(va, l1e, mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, v, pg_owner); + put_gfn(pg_owner, l1egfn); } break; case PGT_l2_page_table: { l2_pgentry_t l2e = l2e_from_intpte(req.val); p2m_type_t l2e_p2mt; - gfn_to_mfn(pg_owner, l2e_get_pfn(l2e), &l2e_p2mt); + unsigned long l2egfn = l2e_get_pfn(l2e), l2emfn; + + l2emfn = mfn_x(get_gfn(pg_owner, l2egfn, &l2e_p2mt)); if ( p2m_is_paged(l2e_p2mt) ) { - p2m_mem_paging_populate(pg_owner, l2e_get_pfn(l2e)); + p2m_mem_paging_populate(pg_owner, l2egfn); + put_gfn(pg_owner, l2egfn); rc = -ENOENT; break; } else if ( p2m_ram_paging_in_start == l2e_p2mt && !mfn_valid(mfn) ) { + put_gfn(pg_owner, l2egfn); rc = -ENOENT; break; } else if ( p2m_ram_shared == l2e_p2mt ) { + put_gfn(pg_owner, l2egfn); MEM_LOG("Unexpected attempt to map shared page.\n"); break; } @@ -3589,33 +3636,40 @@ int do_mmu_update( rc = mod_l2_entry(va, l2e, mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, v); + put_gfn(pg_owner, l2egfn); } break; case PGT_l3_page_table: { l3_pgentry_t l3e = l3e_from_intpte(req.val); p2m_type_t l3e_p2mt; - gfn_to_mfn(pg_owner, l3e_get_pfn(l3e), &l3e_p2mt); + unsigned long l3egfn = l3e_get_pfn(l3e), l3emfn; + + l3emfn = mfn_x(get_gfn(pg_owner, l3egfn, &l3e_p2mt)); if ( p2m_is_paged(l3e_p2mt) ) { - p2m_mem_paging_populate(pg_owner, l3e_get_pfn(l3e)); + p2m_mem_paging_populate(pg_owner, l3egfn); + put_gfn(pg_owner, l3egfn); rc = -ENOENT; break; } else if ( p2m_ram_paging_in_start == l3e_p2mt && !mfn_valid(mfn) ) { + put_gfn(pg_owner, l3egfn); rc = -ENOENT; break; } else if ( p2m_ram_shared == l3e_p2mt ) { + put_gfn(pg_owner, l3egfn); MEM_LOG("Unexpected attempt to map shared page.\n"); break; } rc = mod_l3_entry(va, l3e, mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, 1, v); + put_gfn(pg_owner, l3egfn); } break; #if CONFIG_PAGING_LEVELS >= 4 @@ -3623,27 +3677,33 @@ int do_mmu_update( { l4_pgentry_t l4e = l4e_from_intpte(req.val); p2m_type_t l4e_p2mt; - gfn_to_mfn(pg_owner, l4e_get_pfn(l4e), &l4e_p2mt); + unsigned long l4egfn = l4e_get_pfn(l4e), l4emfn; + + l4emfn = mfn_x(get_gfn(pg_owner, l4egfn, &l4e_p2mt)); if ( p2m_is_paged(l4e_p2mt) ) { - p2m_mem_paging_populate(pg_owner, l4e_get_pfn(l4e)); + p2m_mem_paging_populate(pg_owner, l4egfn); + put_gfn(pg_owner, l4egfn); rc = -ENOENT; break; } else if ( p2m_ram_paging_in_start == l4e_p2mt && !mfn_valid(mfn) ) { + put_gfn(pg_owner, l4egfn); rc = -ENOENT; break; } else if ( p2m_ram_shared == l4e_p2mt ) { + put_gfn(pg_owner, l4egfn); MEM_LOG("Unexpected attempt to map shared page.\n"); break; } rc = mod_l4_entry(va, l4e, mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, 1, v); + put_gfn(pg_owner, l4egfn); } break; #endif @@ -3667,6 +3727,7 @@ int do_mmu_update( unmap_domain_page_with_cache(va, &mapcache); put_page(page); + put_gfn(pt_owner, gmfn); } break; @@ -3753,10 +3814,11 @@ static int create_grant_pte_mapping( adjust_guest_l1e(nl1e, d); gmfn = pte_addr >> PAGE_SHIFT; - mfn = gmfn_to_mfn(d, gmfn); + mfn = get_gfn_untyped(d, gmfn); if ( unlikely(!get_page_from_pagenr(mfn, current->domain)) ) { + put_gfn(d, gmfn); MEM_LOG("Could not get page for normal update"); return GNTST_general_error; } @@ -3794,6 +3856,7 @@ static int create_grant_pte_mapping( failed: unmap_domain_page(va); put_page(page); + put_gfn(d, gmfn); return rc; } @@ -3808,10 +3871,11 @@ static int destroy_grant_pte_mapping( l1_pgentry_t ol1e; gmfn = addr >> PAGE_SHIFT; - mfn = gmfn_to_mfn(d, gmfn); + mfn = get_gfn_untyped(d, gmfn); if ( unlikely(!get_page_from_pagenr(mfn, current->domain)) ) { + put_gfn(d, gmfn); MEM_LOG("Could not get page for normal update"); return GNTST_general_error; } @@ -3863,6 +3927,7 @@ static int destroy_grant_pte_mapping( failed: unmap_domain_page(va); put_page(page); + put_gfn(d, gmfn); return rc; } @@ -4054,9 +4119,10 @@ static int replace_grant_p2m_mapping( if ( new_addr != 0 || (flags & GNTMAP_contains_pte) ) return GNTST_general_error; - old_mfn = gfn_to_mfn(d, gfn, &type); + old_mfn = get_gfn(d, gfn, &type); if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame ) { + put_gfn(d, gfn); gdprintk(XENLOG_WARNING, "replace_grant_p2m_mapping: old mapping invalid (type %d, mfn %lx, frame %lx)\n", type, mfn_x(old_mfn), frame); @@ -4064,6 +4130,7 @@ static int replace_grant_p2m_mapping( } guest_physmap_remove_page(d, gfn, frame, 0); + put_gfn(d, gfn); return GNTST_okay; } @@ -4444,15 +4511,20 @@ long set_gdt(struct vcpu *v, struct domain *d = v->domain; /* NB. There are 512 8-byte entries per GDT page. */ int i, nr_pages = (entries + 511) / 512; - unsigned long mfn; + unsigned long mfn, *pfns; if ( entries > FIRST_RESERVED_GDT_ENTRY ) return -EINVAL; + pfns = xmalloc_array(unsigned long, nr_pages); + if ( !pfns ) + return -ENOMEM; + /* Check the pages in the new GDT. */ for ( i = 0; i < nr_pages; i++ ) { - mfn = frames[i] = gmfn_to_mfn(d, frames[i]); + pfns[i] = frames[i]; + mfn = frames[i] = get_gfn_untyped(d, frames[i]); if ( !mfn_valid(mfn) || !get_page_and_type(mfn_to_page(mfn), d, PGT_seg_desc_page) ) goto fail; @@ -4468,13 +4540,19 @@ long set_gdt(struct vcpu *v, v->arch.pv_vcpu.gdt_frames[i] = frames[i]; l1e_write(&v->arch.perdomain_ptes[i], l1e_from_pfn(frames[i], __PAGE_HYPERVISOR)); + put_gfn(d, pfns[i]); } + xfree(pfns); return 0; fail: while ( i-- > 0 ) + { put_page_and_type(mfn_to_page(frames[i])); + put_gfn(d, pfns[i]); + } + xfree(pfns); return -EINVAL; } @@ -4518,15 +4596,21 @@ long do_update_descriptor(u64 pa, u64 de *(u64 *)&d = desc; - mfn = gmfn_to_mfn(dom, gmfn); + mfn = get_gfn_untyped(dom, gmfn); if ( (((unsigned int)pa % sizeof(struct desc_struct)) != 0) || !mfn_valid(mfn) || !check_descriptor(dom, &d) ) + { + put_gfn(dom, gmfn); return -EINVAL; + } page = mfn_to_page(mfn); if ( unlikely(!get_page(page, dom)) ) + { + put_gfn(dom, gmfn); return -EINVAL; + } /* Check if the given frame is in use in an unsafe context. */ switch ( page->u.inuse.type_info & PGT_type_mask ) @@ -4554,6 +4638,7 @@ long do_update_descriptor(u64 pa, u64 de out: put_page(page); + put_gfn(dom, gmfn); return ret; } @@ -4595,6 +4680,7 @@ static int handle_iomem_range(unsigned l long arch_memory_op(int op, XEN_GUEST_HANDLE(void) arg) { struct page_info *page = NULL; + unsigned long gfn = 0; /* gcc ... */ int rc; switch ( op ) @@ -4652,11 +4738,13 @@ long arch_memory_op(int op, XEN_GUEST_HA case XENMAPSPACE_gmfn: { p2m_type_t p2mt; - - xatp.idx = mfn_x(gfn_to_mfn_unshare(d, xatp.idx, &p2mt)); + gfn = xatp.idx; + + xatp.idx = mfn_x(get_gfn_unshare(d, xatp.idx, &p2mt)); /* If the page is still shared, exit early */ if ( p2m_is_shared(p2mt) ) { + put_gfn(d, gfn); rcu_unlock_domain(d); return -ENOMEM; } @@ -4674,6 +4762,8 @@ long arch_memory_op(int op, XEN_GUEST_HA { if ( page ) put_page(page); + if ( xatp.space == XENMAPSPACE_gmfn ) + put_gfn(d, gfn); rcu_unlock_domain(d); return -EINVAL; } @@ -4684,7 +4774,7 @@ long arch_memory_op(int op, XEN_GUEST_HA put_page(page); /* Remove previously mapped page if it was present. */ - prev_mfn = gmfn_to_mfn(d, xatp.gpfn); + prev_mfn = get_gfn_untyped(d, xatp.gpfn); if ( mfn_valid(prev_mfn) ) { if ( is_xen_heap_mfn(prev_mfn) ) @@ -4694,6 +4784,8 @@ long arch_memory_op(int op, XEN_GUEST_HA /* Normal domain memory is freed, to avoid leaking memory. */ guest_remove_page(d, xatp.gpfn); } + /* In the XENMAPSPACE_gmfn case we still hold a ref on the old page. */ + put_gfn(d, xatp.gpfn); /* Unmap from old location, if any. */ gpfn = get_gpfn_from_mfn(mfn); @@ -4704,6 +4796,9 @@ long arch_memory_op(int op, XEN_GUEST_HA /* Map at new location. */ rc = guest_physmap_add_page(d, xatp.gpfn, mfn, 0); + /* In the XENMAPSPACE_gmfn, we took a ref and locked the p2m at the top */ + if ( xatp.space == XENMAPSPACE_gmfn ) + put_gfn(d, gfn); domain_unlock(d); rcu_unlock_domain(d); diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/guest_walk.c --- a/xen/arch/x86/mm/guest_walk.c +++ b/xen/arch/x86/mm/guest_walk.c @@ -86,30 +86,33 @@ static uint32_t set_ad_bits(void *guest_ return 0; } +/* If the map is non-NULL, we leave this function having + * called get_gfn, you need to call put_gfn. */ static inline void *map_domain_gfn(struct p2m_domain *p2m, gfn_t gfn, mfn_t *mfn, p2m_type_t *p2mt, uint32_t *rc) { - p2m_access_t a; - /* Translate the gfn, unsharing if shared */ - *mfn = gfn_to_mfn_type_p2m(p2m, gfn_x(gfn), p2mt, &a, p2m_unshare, NULL); + *mfn = get_gfn_unshare(p2m->domain, gfn_x(gfn), p2mt); if ( p2m_is_paging(*p2mt) ) { ASSERT(!p2m_is_nestedp2m(p2m)); p2m_mem_paging_populate(p2m->domain, gfn_x(gfn)); + put_gfn(p2m->domain, gfn_x(gfn)); *rc = _PAGE_PAGED; return NULL; } if ( p2m_is_shared(*p2mt) ) { + put_gfn(p2m->domain, gfn_x(gfn)); *rc = _PAGE_SHARED; return NULL; } if ( !p2m_is_ram(*p2mt) ) { + put_gfn(p2m->domain, gfn_x(gfn)); *rc |= _PAGE_PRESENT; return NULL; } @@ -120,6 +123,9 @@ static inline void *map_domain_gfn(struc /* Walk the guest pagetables, after the manner of a hardware walker. */ +/* Because the walk is essentially random, it can cause a deadlock + * warning in the p2m locking code. Highly unlikely this is an actual + * deadlock, because who would walk page table in the opposite order? */ uint32_t guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, unsigned long va, walk_t *gw, @@ -347,12 +353,24 @@ set_ad: out: #if GUEST_PAGING_LEVELS == 4 - if ( l3p ) unmap_domain_page(l3p); + if ( l3p ) + { + unmap_domain_page(l3p); + put_gfn(p2m->domain, gfn_x(guest_l4e_get_gfn(gw->l4e))); + } #endif #if GUEST_PAGING_LEVELS >= 3 - if ( l2p ) unmap_domain_page(l2p); + if ( l2p ) + { + unmap_domain_page(l2p); + put_gfn(p2m->domain, gfn_x(guest_l3e_get_gfn(gw->l3e))); + } #endif - if ( l1p ) unmap_domain_page(l1p); + if ( l1p ) + { + unmap_domain_page(l1p); + put_gfn(p2m->domain, gfn_x(guest_l2e_get_gfn(gw->l2e))); + } return rc; } diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/hap/guest_walk.c --- a/xen/arch/x86/mm/hap/guest_walk.c +++ b/xen/arch/x86/mm/hap/guest_walk.c @@ -54,28 +54,31 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA mfn_t top_mfn; void *top_map; p2m_type_t p2mt; - p2m_access_t p2ma; walk_t gw; + unsigned long top_gfn; /* Get the top-level table''s MFN */ - top_mfn = gfn_to_mfn_type_p2m(p2m, cr3 >> PAGE_SHIFT, - &p2mt, &p2ma, p2m_unshare, NULL); + top_gfn = cr3 >> PAGE_SHIFT; + top_mfn = get_gfn_unshare(p2m->domain, top_gfn, &p2mt); if ( p2m_is_paging(p2mt) ) { ASSERT(!p2m_is_nestedp2m(p2m)); p2m_mem_paging_populate(p2m->domain, cr3 >> PAGE_SHIFT); pfec[0] = PFEC_page_paged; + put_gfn(p2m->domain, top_gfn); return INVALID_GFN; } if ( p2m_is_shared(p2mt) ) { pfec[0] = PFEC_page_shared; + put_gfn(p2m->domain, top_gfn); return INVALID_GFN; } if ( !p2m_is_ram(p2mt) ) { pfec[0] &= ~PFEC_page_present; + put_gfn(p2m->domain, top_gfn); return INVALID_GFN; } @@ -87,26 +90,31 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA #endif missing = guest_walk_tables(v, p2m, ga, &gw, pfec[0], top_mfn, top_map); unmap_domain_page(top_map); + put_gfn(p2m->domain, top_gfn); /* Interpret the answer */ if ( missing == 0 ) { gfn_t gfn = guest_l1e_get_gfn(gw.l1e); - gfn_to_mfn_type_p2m(p2m, gfn_x(gfn), &p2mt, &p2ma, p2m_unshare, NULL); + (void)get_gfn_unshare(p2m->domain, gfn_x(gfn), &p2mt); if ( p2m_is_paging(p2mt) ) { ASSERT(!p2m_is_nestedp2m(p2m)); p2m_mem_paging_populate(p2m->domain, gfn_x(gfn)); pfec[0] = PFEC_page_paged; + put_gfn(p2m->domain, gfn_x(gfn)); return INVALID_GFN; } if ( p2m_is_shared(p2mt) ) { pfec[0] = PFEC_page_shared; + put_gfn(p2m->domain, gfn_x(gfn)); return INVALID_GFN; } + put_gfn(p2m->domain, gfn_x(gfn)); + if ( page_order ) *page_order = guest_walk_to_page_order(&gw); diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/hap/nested_hap.c --- a/xen/arch/x86/mm/hap/nested_hap.c +++ b/xen/arch/x86/mm/hap/nested_hap.c @@ -146,22 +146,29 @@ nestedhap_walk_L0_p2m(struct p2m_domain mfn_t mfn; p2m_type_t p2mt; p2m_access_t p2ma; + int rc; /* walk L0 P2M table */ mfn = gfn_to_mfn_type_p2m(p2m, L1_gpa >> PAGE_SHIFT, &p2mt, &p2ma, p2m_query, page_order); + rc = NESTEDHVM_PAGEFAULT_MMIO; if ( p2m_is_mmio(p2mt) ) - return NESTEDHVM_PAGEFAULT_MMIO; + goto out; + rc = NESTEDHVM_PAGEFAULT_ERROR; if ( p2m_is_paging(p2mt) || p2m_is_shared(p2mt) || !p2m_is_ram(p2mt) ) - return NESTEDHVM_PAGEFAULT_ERROR; + goto out; + rc = NESTEDHVM_PAGEFAULT_ERROR; if ( !mfn_valid(mfn) ) - return NESTEDHVM_PAGEFAULT_ERROR; + goto out; *L0_gpa = (mfn_x(mfn) << PAGE_SHIFT) + (L1_gpa & ~PAGE_MASK); - return NESTEDHVM_PAGEFAULT_DONE; + rc = NESTEDHVM_PAGEFAULT_DONE; +out: + put_gfn(p2m->domain, L1_gpa >> PAGE_SHIFT); + return rc; } /* This function uses L2_gpa to walk the P2M page table in L1. If the diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/mem_event.c --- a/xen/arch/x86/mm/mem_event.c +++ b/xen/arch/x86/mm/mem_event.c @@ -47,7 +47,7 @@ static int mem_event_enable(struct domai unsigned long ring_addr = mec->ring_addr; unsigned long shared_addr = mec->shared_addr; l1_pgentry_t l1e; - unsigned long gfn; + unsigned long shared_gfn = 0, ring_gfn = 0; /* gcc ... */ p2m_type_t p2mt; mfn_t ring_mfn; mfn_t shared_mfn; @@ -60,23 +60,36 @@ static int mem_event_enable(struct domai /* Get MFN of ring page */ guest_get_eff_l1e(v, ring_addr, &l1e); - gfn = l1e_get_pfn(l1e); - ring_mfn = gfn_to_mfn(dom_mem_event, gfn, &p2mt); + ring_gfn = l1e_get_pfn(l1e); + /* We''re grabbing these two in an order that could deadlock + * dom0 if 1. it were an hvm 2. there were two concurrent + * enables 3. the two gfn''s in each enable criss-crossed + * 2MB regions. Duly noted.... */ + ring_mfn = get_gfn(dom_mem_event, ring_gfn, &p2mt); if ( unlikely(!mfn_valid(mfn_x(ring_mfn))) ) + { + put_gfn(dom_mem_event, ring_gfn); return -EINVAL; + } /* Get MFN of shared page */ guest_get_eff_l1e(v, shared_addr, &l1e); - gfn = l1e_get_pfn(l1e); - shared_mfn = gfn_to_mfn(dom_mem_event, gfn, &p2mt); + shared_gfn = l1e_get_pfn(l1e); + shared_mfn = get_gfn(dom_mem_event, shared_gfn, &p2mt); if ( unlikely(!mfn_valid(mfn_x(shared_mfn))) ) + { + put_gfn(dom_mem_event, ring_gfn); + put_gfn(dom_mem_event, shared_gfn); return -EINVAL; + } /* Map ring and shared pages */ med->ring_page = map_domain_page(mfn_x(ring_mfn)); med->shared_page = map_domain_page(mfn_x(shared_mfn)); + put_gfn(dom_mem_event, ring_gfn); + put_gfn(dom_mem_event, shared_gfn); /* Allocate event channel */ rc = alloc_unbound_xen_event_channel(d->vcpu[0], diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -227,7 +227,7 @@ static void mem_sharing_audit(void) g->domain, g->gfn, mfn_x(e->mfn)); continue; } - mfn = gfn_to_mfn(d, g->gfn, &t); + mfn = get_gfn_unlocked(d, g->gfn, &t); if(mfn_x(mfn) != mfn_x(e->mfn)) MEM_SHARING_DEBUG("Incorrect P2M for d=%d, PFN=%lx." "Expecting MFN=%ld, got %ld\n", @@ -335,7 +335,7 @@ int mem_sharing_debug_gfn(struct domain p2m_type_t p2mt; mfn_t mfn; - mfn = gfn_to_mfn(d, gfn, &p2mt); + mfn = get_gfn_unlocked(d, gfn, &p2mt); printk("Debug for domain=%d, gfn=%lx, ", d->domain_id, @@ -460,7 +460,7 @@ int mem_sharing_nominate_page(struct dom *phandle = 0UL; shr_lock(); - mfn = gfn_to_mfn(d, gfn, &p2mt); + mfn = get_gfn(d, gfn, &p2mt); /* Check if mfn is valid */ ret = -EINVAL; @@ -524,6 +524,7 @@ int mem_sharing_nominate_page(struct dom ret = 0; out: + put_gfn(d, gfn); shr_unlock(); return ret; } @@ -593,14 +594,18 @@ int mem_sharing_unshare_page(struct doma shr_handle_t handle; struct list_head *le; + /* Remove the gfn_info from the list */ + + /* This is one of the reasons why we can''t enforce ordering + * between shr_lock and p2m fine-grained locks in mm-lock. + * Callers may walk in here already holding the lock for this gfn */ shr_lock(); mem_sharing_audit(); - - /* Remove the gfn_info from the list */ - mfn = gfn_to_mfn(d, gfn, &p2mt); + mfn = get_gfn(d, gfn, &p2mt); /* Has someone already unshared it? */ if (!p2m_is_shared(p2mt)) { + put_gfn(d, gfn); shr_unlock(); return 0; } @@ -634,6 +639,7 @@ gfn_found: /* Even though we don''t allocate a private page, we have to account * for the MFN that originally backed this PFN. */ atomic_dec(&nr_saved_mfns); + put_gfn(d, gfn); shr_unlock(); put_page_and_type(page); if(last_gfn && @@ -653,6 +659,7 @@ gfn_found: /* We''ve failed to obtain memory for private page. Need to re-add the * gfn_info to relevant list */ list_add(&gfn_info->list, &hash_entry->gfns); + put_gfn(d, gfn); shr_unlock(); return -ENOMEM; } @@ -663,6 +670,13 @@ gfn_found: unmap_domain_page(s); unmap_domain_page(t); + /* NOTE: set_shared_p2m_entry will switch the underlying mfn. If + * we do get_page withing get_gfn, the correct sequence here + * should be + get_page(page); + put_page(old_page); + * so that the ref to the old page is dropped, and a ref to + * the new page is obtained to later be dropped in put_gfn */ BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page)) == 0); put_page_and_type(old_page); @@ -683,6 +697,7 @@ private_page_found: /* Update m2p entry */ set_gpfn_from_mfn(mfn_x(page_to_mfn(page)), gfn); + put_gfn(d, gfn); shr_unlock(); return 0; } diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/p2m-pod.c --- a/xen/arch/x86/mm/p2m-pod.c +++ b/xen/arch/x86/mm/p2m-pod.c @@ -531,9 +531,10 @@ p2m_pod_decrease_reservation(struct doma /* FIXME: Add contiguous; query for PSE entries? */ for ( i=0; i<(1<<order); i++) { + p2m_access_t a; p2m_type_t t; - gfn_to_mfn_query(d, gpfn + i, &t); + (void)p2m->get_entry(p2m, gpfn + i, &t, &a, p2m_query, NULL); if ( t == p2m_populate_on_demand ) pod++; @@ -572,8 +573,9 @@ p2m_pod_decrease_reservation(struct doma { mfn_t mfn; p2m_type_t t; + p2m_access_t a; - mfn = gfn_to_mfn_query(d, gpfn + i, &t); + mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, p2m_query, NULL); if ( t == p2m_populate_on_demand ) { set_p2m_entry(p2m, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid, p2m->default_access); @@ -653,8 +655,8 @@ p2m_pod_zero_check_superpage(struct p2m_ * and aligned, and mapping them. */ for ( i=0; i<SUPERPAGE_PAGES; i++ ) { - - mfn = gfn_to_mfn_query(d, gfn + i, &type); + p2m_access_t a; + mfn = p2m->get_entry(p2m, gfn + i, &type, &a, p2m_query, NULL); if ( i == 0 ) { @@ -782,7 +784,8 @@ p2m_pod_zero_check(struct p2m_domain *p2 /* First, get the gfn list, translate to mfns, and map the pages. */ for ( i=0; i<count; i++ ) { - mfns[i] = gfn_to_mfn_query(d, gfns[i], types + i); + p2m_access_t a; + mfns[i] = p2m->get_entry(p2m, gfns[i], types + i, &a, p2m_query, NULL); /* If this is ram, and not a pagetable or from the xen heap, and probably not mapped elsewhere, map it; otherwise, skip. */ if ( p2m_is_ram(types[i]) @@ -923,7 +926,8 @@ p2m_pod_emergency_sweep(struct p2m_domai /* FIXME: Figure out how to avoid superpages */ for ( i=p2m->pod.reclaim_single; i > 0 ; i-- ) { - gfn_to_mfn_query(p2m->domain, i, &t ); + p2m_access_t a; + (void)p2m->get_entry(p2m, i, &t, &a, p2m_query, NULL); if ( p2m_is_ram(t) ) { gfns[j] = i; @@ -1112,7 +1116,8 @@ guest_physmap_mark_populate_on_demand(st /* Make sure all gpfns are unused */ for ( i = 0; i < (1UL << order); i++ ) { - omfn = gfn_to_mfn_query(d, gfn + i, &ot); + p2m_access_t a; + omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, p2m_query, NULL); if ( p2m_is_ram(ot) ) { printk("%s: gfn_to_mfn returned type %d!\n", diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/p2m-pt.c --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -1051,7 +1051,7 @@ void audit_p2m(struct p2m_domain *p2m, i continue; } - p2mfn = gfn_to_mfn_type_p2m(p2m, gfn, &type, p2m_query); + p2mfn = get_gfn_query(p2m->domain, gfn, &type); if ( strict_m2p && mfn_x(p2mfn) != mfn ) { mpbad++; @@ -1066,15 +1066,17 @@ void audit_p2m(struct p2m_domain *p2m, i * blow away the m2p entry. */ set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY); } + put_gfn(p2m->domain, gfn); if ( test_linear && (gfn <= p2m->max_mapped_pfn) ) { - lp2mfn = mfn_x(gfn_to_mfn_type_p2m(p2m, gfn, &type, p2m_query)); + lp2mfn = mfn_x(get_gfn_query(p2m->domain, gfn, &type)); if ( lp2mfn != mfn_x(p2mfn) ) { P2M_PRINTK("linear mismatch gfn %#lx -> mfn %#lx " "(!= mfn %#lx)\n", gfn, lp2mfn, mfn_x(p2mfn)); } + put_gfn(p2m->domain, gfn); } // P2M_PRINTK("OK: mfn=%#lx, gfn=%#lx, p2mfn=%#lx, lp2mfn=%#lx\n", diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -343,7 +343,6 @@ void p2m_teardown(struct p2m_domain *p2m #ifdef __x86_64__ unsigned long gfn; p2m_type_t t; - p2m_access_t a; mfn_t mfn; #endif @@ -353,13 +352,13 @@ void p2m_teardown(struct p2m_domain *p2m #ifdef __x86_64__ for ( gfn=0; gfn < p2m->max_mapped_pfn; gfn++ ) { - mfn = gfn_to_mfn_type_p2m(p2m, gfn, &t, &a, p2m_query, NULL); + mfn = get_gfn_query(d, gfn, &t); if ( mfn_valid(mfn) && (t == p2m_ram_shared) ) { ASSERT(!p2m_is_nestedp2m(p2m)); BUG_ON(mem_sharing_unshare_page(d, gfn, MEM_SHARING_DESTROY_GFN)); } - + put_gfn(d, gfn); } #endif @@ -454,6 +453,7 @@ guest_physmap_add_entry(struct domain *d struct p2m_domain *p2m = p2m_get_hostp2m(d); unsigned long i, ogfn; p2m_type_t ot; + p2m_access_t a; mfn_t omfn; int pod_count = 0; int rc = 0; @@ -489,12 +489,13 @@ guest_physmap_add_entry(struct domain *d /* First, remove m->p mappings for existing p->m mappings */ for ( i = 0; i < (1UL << page_order); i++ ) { - omfn = gfn_to_mfn_query(d, gfn + i, &ot); + omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, p2m_query, NULL); if ( p2m_is_grant(ot) ) { /* Really shouldn''t be unmapping grant maps this way */ domain_crash(d); p2m_unlock(p2m); + return -EINVAL; } else if ( p2m_is_ram(ot) ) @@ -528,7 +529,7 @@ guest_physmap_add_entry(struct domain *d * address */ P2M_DEBUG("aliased! mfn=%#lx, old gfn=%#lx, new gfn=%#lx\n", mfn + i, ogfn, gfn + i); - omfn = gfn_to_mfn_query(d, ogfn, &ot); + omfn = p2m->get_entry(p2m, ogfn, &ot, &a, p2m_query, NULL); if ( p2m_is_ram(ot) ) { ASSERT(mfn_valid(omfn)); @@ -577,6 +578,7 @@ guest_physmap_add_entry(struct domain *d p2m_type_t p2m_change_type(struct domain *d, unsigned long gfn, p2m_type_t ot, p2m_type_t nt) { + p2m_access_t a; p2m_type_t pt; mfn_t mfn; struct p2m_domain *p2m = p2m_get_hostp2m(d); @@ -585,7 +587,7 @@ p2m_type_t p2m_change_type(struct domain p2m_lock(p2m); - mfn = gfn_to_mfn_query(d, gfn, &pt); + mfn = p2m->get_entry(p2m, gfn, &pt, &a, p2m_query, NULL); if ( pt == ot ) set_p2m_entry(p2m, gfn, mfn, 0, nt, p2m->default_access); @@ -600,6 +602,7 @@ void p2m_change_type_range(struct domain unsigned long start, unsigned long end, p2m_type_t ot, p2m_type_t nt) { + p2m_access_t a; p2m_type_t pt; unsigned long gfn; mfn_t mfn; @@ -612,7 +615,7 @@ void p2m_change_type_range(struct domain for ( gfn = start; gfn < end; gfn++ ) { - mfn = gfn_to_mfn_query(d, gfn, &pt); + mfn = p2m->get_entry(p2m, gfn, &pt, &a, p2m_query, NULL); if ( pt == ot ) set_p2m_entry(p2m, gfn, mfn, 0, nt, p2m->default_access); } @@ -629,6 +632,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) { int rc = 0; + p2m_access_t a; p2m_type_t ot; mfn_t omfn; struct p2m_domain *p2m = p2m_get_hostp2m(d); @@ -637,7 +641,7 @@ set_mmio_p2m_entry(struct domain *d, uns return 0; p2m_lock(p2m); - omfn = gfn_to_mfn_query(d, gfn, &ot); + omfn = p2m->get_entry(p2m, gfn, &ot, &a, p2m_query, NULL); if ( p2m_is_grant(ot) ) { p2m_unlock(p2m); @@ -657,7 +661,7 @@ set_mmio_p2m_entry(struct domain *d, uns if ( 0 == rc ) gdprintk(XENLOG_ERR, "set_mmio_p2m_entry: set_p2m_entry failed! mfn=%08lx\n", - mfn_x(gfn_to_mfn_query(d, gfn, &ot))); + mfn_x(p2m->get_entry(p2m, gfn, &ot, &a, p2m_query, NULL))); return rc; } @@ -666,6 +670,7 @@ clear_mmio_p2m_entry(struct domain *d, u { int rc = 0; mfn_t mfn; + p2m_access_t a; p2m_type_t t; struct p2m_domain *p2m = p2m_get_hostp2m(d); @@ -673,7 +678,7 @@ clear_mmio_p2m_entry(struct domain *d, u return 0; p2m_lock(p2m); - mfn = gfn_to_mfn_query(d, gfn, &t); + mfn = p2m->get_entry(p2m, gfn, &t, &a, p2m_query, NULL); /* Do not use mfn_valid() here as it will usually fail for MMIO pages. */ if ( (INVALID_MFN == mfn_x(mfn)) || (t != p2m_mmio_direct) ) @@ -696,6 +701,7 @@ set_shared_p2m_entry(struct domain *d, u { struct p2m_domain *p2m = p2m_get_hostp2m(d); int rc = 0; + p2m_access_t a; p2m_type_t ot; mfn_t omfn; @@ -703,7 +709,7 @@ set_shared_p2m_entry(struct domain *d, u return 0; p2m_lock(p2m); - omfn = gfn_to_mfn_query(p2m->domain, gfn, &ot); + omfn = p2m->get_entry(p2m, gfn, &ot, &a, p2m_query, NULL); /* At the moment we only allow p2m change if gfn has already been made * sharable first */ ASSERT(p2m_is_shared(ot)); @@ -717,7 +723,7 @@ set_shared_p2m_entry(struct domain *d, u if ( 0 == rc ) gdprintk(XENLOG_ERR, "set_shared_p2m_entry: set_p2m_entry failed! mfn=%08lx\n", - mfn_x(gfn_to_mfn_query(d, gfn, &ot))); + mfn_x(p2m->get_entry(p2m, gfn, &ot, &a, p2m_query, NULL))); return rc; } @@ -1168,7 +1174,7 @@ int p2m_set_mem_access(struct domain *d, { struct p2m_domain *p2m = p2m_get_hostp2m(d); unsigned long pfn; - p2m_access_t a; + p2m_access_t a, _a; p2m_type_t t; mfn_t mfn; int rc = 0; @@ -1202,7 +1208,7 @@ int p2m_set_mem_access(struct domain *d, p2m_lock(p2m); for ( pfn = start_pfn; pfn < start_pfn + nr; pfn++ ) { - mfn = gfn_to_mfn_query(d, pfn, &t); + mfn = p2m->get_entry(p2m, pfn, &t, &_a, p2m_query, NULL); if ( p2m->set_entry(p2m, pfn, mfn, PAGE_ORDER_4K, t, a) == 0 ) { rc = -ENOMEM; diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/shadow/common.c --- a/xen/arch/x86/mm/shadow/common.c +++ b/xen/arch/x86/mm/shadow/common.c @@ -3676,7 +3676,7 @@ int shadow_track_dirty_vram(struct domai /* Iterate over VRAM to track dirty bits. */ for ( i = 0; i < nr; i++ ) { - mfn_t mfn = gfn_to_mfn_query(d, begin_pfn + i, &t); + mfn_t mfn = get_gfn_query(d, begin_pfn + i, &t); struct page_info *page; int dirty = 0; paddr_t sl1ma = dirty_vram->sl1ma[i]; @@ -3741,6 +3741,8 @@ int shadow_track_dirty_vram(struct domai } } + put_gfn(d, begin_pfn + i); + if ( dirty ) { dirty_vram->dirty_bitmap[i / 8] |= 1 << (i % 8); @@ -3761,7 +3763,7 @@ int shadow_track_dirty_vram(struct domai /* was clean for more than two seconds, try to disable guest * write access */ for ( i = begin_pfn; i < end_pfn; i++ ) { - mfn_t mfn = gfn_to_mfn_query(d, i, &t); + mfn_t mfn = get_gfn_query_unlocked(d, i, &t); if (mfn_x(mfn) != INVALID_MFN) flush_tlb |= sh_remove_write_access(d->vcpu[0], mfn, 1, 0); } diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/shadow/multi.c --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -2265,7 +2265,7 @@ static int validate_gl4e(struct vcpu *v, if ( guest_l4e_get_flags(new_gl4e) & _PAGE_PRESENT ) { gfn_t gl3gfn = guest_l4e_get_gfn(new_gl4e); - mfn_t gl3mfn = gfn_to_mfn_query(d, gl3gfn, &p2mt); + mfn_t gl3mfn = get_gfn_query(d, gl3gfn, &p2mt); if ( p2m_is_ram(p2mt) ) sl3mfn = get_shadow_status(v, gl3mfn, SH_type_l3_shadow); else if ( p2mt != p2m_populate_on_demand ) @@ -2275,6 +2275,7 @@ static int validate_gl4e(struct vcpu *v, if ( mfn_valid(sl3mfn) ) shadow_resync_all(v); #endif + put_gfn(d, gfn_x(gl3gfn)); } l4e_propagate_from_guest(v, new_gl4e, sl3mfn, &new_sl4e, ft_prefetch); @@ -2322,7 +2323,7 @@ static int validate_gl3e(struct vcpu *v, if ( guest_l3e_get_flags(new_gl3e) & _PAGE_PRESENT ) { gfn_t gl2gfn = guest_l3e_get_gfn(new_gl3e); - mfn_t gl2mfn = gfn_to_mfn_query(v->domain, gl2gfn, &p2mt); + mfn_t gl2mfn = get_gfn_query(v->domain, gl2gfn, &p2mt); if ( p2m_is_ram(p2mt) ) sl2mfn = get_shadow_status(v, gl2mfn, SH_type_l2_shadow); else if ( p2mt != p2m_populate_on_demand ) @@ -2332,6 +2333,7 @@ static int validate_gl3e(struct vcpu *v, if ( mfn_valid(sl2mfn) ) shadow_resync_all(v); #endif + put_gfn(v->domain, gfn_x(gl2gfn)); } l3e_propagate_from_guest(v, new_gl3e, sl2mfn, &new_sl3e, ft_prefetch); result |= shadow_set_l3e(v, sl3p, new_sl3e, sl3mfn); @@ -2371,11 +2373,12 @@ static int validate_gl2e(struct vcpu *v, } else { - mfn_t gl1mfn = gfn_to_mfn_query(v->domain, gl1gfn, &p2mt); + mfn_t gl1mfn = get_gfn_query(v->domain, gl1gfn, &p2mt); if ( p2m_is_ram(p2mt) ) sl1mfn = get_shadow_status(v, gl1mfn, SH_type_l1_shadow); else if ( p2mt != p2m_populate_on_demand ) result |= SHADOW_SET_ERROR; + put_gfn(v->domain, gfn_x(gl1gfn)); } } l2e_propagate_from_guest(v, new_gl2e, sl1mfn, &new_sl2e, ft_prefetch); @@ -2441,7 +2444,7 @@ static int validate_gl1e(struct vcpu *v, perfc_incr(shadow_validate_gl1e_calls); gfn = guest_l1e_get_gfn(new_gl1e); - gmfn = gfn_to_mfn_query(v->domain, gfn, &p2mt); + gmfn = get_gfn_query(v->domain, gfn, &p2mt); l1e_propagate_from_guest(v, new_gl1e, gmfn, &new_sl1e, ft_prefetch, p2mt); result |= shadow_set_l1e(v, sl1p, new_sl1e, p2mt, sl1mfn); @@ -2463,6 +2466,7 @@ static int validate_gl1e(struct vcpu *v, } #endif /* OOS */ + put_gfn(v->domain, gfn_x(gfn)); return result; } @@ -2501,10 +2505,11 @@ void sh_resync_l1(struct vcpu *v, mfn_t shadow_l1e_t nsl1e; gfn = guest_l1e_get_gfn(gl1e); - gmfn = gfn_to_mfn_query(v->domain, gfn, &p2mt); + gmfn = get_gfn_query(v->domain, gfn, &p2mt); l1e_propagate_from_guest(v, gl1e, gmfn, &nsl1e, ft_prefetch, p2mt); rc |= shadow_set_l1e(v, sl1p, nsl1e, p2mt, sl1mfn); + put_gfn(v->domain, gfn_x(gfn)); *snpl1p = gl1e; } }); @@ -2824,7 +2829,7 @@ static void sh_prefetch(struct vcpu *v, /* Look at the gfn that the l1e is pointing at */ gfn = guest_l1e_get_gfn(gl1e); - gmfn = gfn_to_mfn_query(v->domain, gfn, &p2mt); + gmfn = get_gfn_query(v->domain, gfn, &p2mt); /* Propagate the entry. */ l1e_propagate_from_guest(v, gl1e, gmfn, &sl1e, ft_prefetch, p2mt); @@ -2834,6 +2839,8 @@ static void sh_prefetch(struct vcpu *v, if ( snpl1p != NULL ) snpl1p[i] = gl1e; #endif /* OOS */ + + put_gfn(v->domain, gfn_x(gfn)); } if ( gl1p != NULL ) sh_unmap_domain_page(gl1p); @@ -3182,7 +3189,7 @@ static int sh_page_fault(struct vcpu *v, /* What mfn is the guest trying to access? */ gfn = guest_l1e_get_gfn(gw.l1e); - gmfn = gfn_to_mfn_guest(d, gfn, &p2mt); + gmfn = get_gfn_guest(d, gfn, &p2mt); if ( shadow_mode_refcounts(d) && ((!p2m_is_valid(p2mt) && !p2m_is_grant(p2mt)) || @@ -3192,6 +3199,7 @@ static int sh_page_fault(struct vcpu *v, SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n", gfn_x(gfn), mfn_x(gmfn)); reset_early_unshadow(v); + put_gfn(d, gfn_x(gfn)); goto propagate; } @@ -3236,6 +3244,7 @@ static int sh_page_fault(struct vcpu *v, if ( rc & GW_RMWR_REWALK ) { paging_unlock(d); + put_gfn(d, gfn_x(gfn)); goto rewalk; } #endif /* OOS */ @@ -3244,6 +3253,7 @@ static int sh_page_fault(struct vcpu *v, { perfc_incr(shadow_inconsistent_gwalk); paging_unlock(d); + put_gfn(d, gfn_x(gfn)); goto rewalk; } @@ -3270,6 +3280,7 @@ static int sh_page_fault(struct vcpu *v, ASSERT(d->is_shutting_down); #endif paging_unlock(d); + put_gfn(d, gfn_x(gfn)); trace_shadow_gen(TRC_SHADOW_DOMF_DYING, va); return 0; } @@ -3287,6 +3298,7 @@ static int sh_page_fault(struct vcpu *v, * failed. We cannot safely continue since some page is still * OOS but not in the hash table anymore. */ paging_unlock(d); + put_gfn(d, gfn_x(gfn)); return 0; } @@ -3296,6 +3308,7 @@ static int sh_page_fault(struct vcpu *v, { perfc_incr(shadow_inconsistent_gwalk); paging_unlock(d); + put_gfn(d, gfn_x(gfn)); goto rewalk; } #endif /* OOS */ @@ -3389,6 +3402,7 @@ static int sh_page_fault(struct vcpu *v, SHADOW_PRINTK("fixed\n"); shadow_audit_tables(v); paging_unlock(d); + put_gfn(d, gfn_x(gfn)); return EXCRET_fault_fixed; emulate: @@ -3457,6 +3471,7 @@ static int sh_page_fault(struct vcpu *v, sh_audit_gw(v, &gw); shadow_audit_tables(v); paging_unlock(d); + put_gfn(d, gfn_x(gfn)); this_cpu(trace_emulate_write_val) = 0; @@ -3595,6 +3610,7 @@ static int sh_page_fault(struct vcpu *v, shadow_audit_tables(v); reset_early_unshadow(v); paging_unlock(d); + put_gfn(d, gfn_x(gfn)); trace_shadow_gen(TRC_SHADOW_MMIO, va); return (handle_mmio_with_translation(va, gpa >> PAGE_SHIFT) ? EXCRET_fault_fixed : 0); @@ -3605,6 +3621,7 @@ static int sh_page_fault(struct vcpu *v, shadow_audit_tables(v); reset_early_unshadow(v); paging_unlock(d); + put_gfn(d, gfn_x(gfn)); propagate: trace_not_shadow_fault(gw.l1e, va); @@ -4292,7 +4309,7 @@ sh_update_cr3(struct vcpu *v, int do_loc if ( guest_l3e_get_flags(gl3e[i]) & _PAGE_PRESENT ) { gl2gfn = guest_l3e_get_gfn(gl3e[i]); - gl2mfn = gfn_to_mfn_query(d, gl2gfn, &p2mt); + gl2mfn = get_gfn_query_unlocked(d, gfn_x(gl2gfn), &p2mt); if ( p2m_is_ram(p2mt) ) flush |= sh_remove_write_access(v, gl2mfn, 2, 0); } @@ -4305,13 +4322,14 @@ sh_update_cr3(struct vcpu *v, int do_loc if ( guest_l3e_get_flags(gl3e[i]) & _PAGE_PRESENT ) { gl2gfn = guest_l3e_get_gfn(gl3e[i]); - gl2mfn = gfn_to_mfn_query(d, gl2gfn, &p2mt); + gl2mfn = get_gfn_query(d, gl2gfn, &p2mt); if ( p2m_is_ram(p2mt) ) sh_set_toplevel_shadow(v, i, gl2mfn, (i == 3) ? SH_type_l2h_shadow : SH_type_l2_shadow); else sh_set_toplevel_shadow(v, i, _mfn(INVALID_MFN), 0); + put_gfn(d, gfn_x(gl2gfn)); } else sh_set_toplevel_shadow(v, i, _mfn(INVALID_MFN), 0); @@ -4689,11 +4707,12 @@ static void sh_pagetable_dying(struct vc int flush = 0; int fast_path = 0; paddr_t gcr3 = 0; - mfn_t smfn, gmfn; p2m_type_t p2mt; char *gl3pa = NULL; guest_l3e_t *gl3e = NULL; paddr_t gl2a = 0; + unsigned long l3gfn; + mfn_t l3mfn; paging_lock(v->domain); @@ -4702,8 +4721,9 @@ static void sh_pagetable_dying(struct vc if ( gcr3 == gpa ) fast_path = 1; - gmfn = gfn_to_mfn_query(v->domain, _gfn(gpa >> PAGE_SHIFT), &p2mt); - if ( !mfn_valid(gmfn) || !p2m_is_ram(p2mt) ) + l3gfn = gpa >> PAGE_SHIFT; + l3mfn = get_gfn_query(v->domain, _gfn(l3gfn), &p2mt); + if ( !mfn_valid(l3mfn) || !p2m_is_ram(p2mt) ) { printk(XENLOG_DEBUG "sh_pagetable_dying: gpa not valid %"PRIpaddr"\n", gpa); @@ -4711,19 +4731,24 @@ static void sh_pagetable_dying(struct vc } if ( !fast_path ) { - gl3pa = sh_map_domain_page(gmfn); + gl3pa = sh_map_domain_page(l3mfn); gl3e = (guest_l3e_t *)(gl3pa + ((unsigned long)gpa & ~PAGE_MASK)); } for ( i = 0; i < 4; i++ ) { + unsigned long gfn; + mfn_t smfn, gmfn; + if ( fast_path ) smfn = _mfn(pagetable_get_pfn(v->arch.shadow_table[i])); else { /* retrieving the l2s */ gl2a = guest_l3e_get_paddr(gl3e[i]); - gmfn = gfn_to_mfn_query(v->domain, _gfn(gl2a >> PAGE_SHIFT), &p2mt); + gfn = gl2a >> PAGE_SHIFT; + gmfn = get_gfn_query(v->domain, _gfn(gfn), &p2mt); smfn = shadow_hash_lookup(v, mfn_x(gmfn), SH_type_l2_pae_shadow); + put_gfn(v->domain, gfn); } if ( mfn_valid(smfn) ) @@ -4747,6 +4772,7 @@ static void sh_pagetable_dying(struct vc out: if ( !fast_path ) unmap_domain_page(gl3pa); + put_gfn(v->domain, l3gfn); paging_unlock(v->domain); } #else @@ -4757,12 +4783,14 @@ static void sh_pagetable_dying(struct vc paging_lock(v->domain); - gmfn = gfn_to_mfn_query(v->domain, _gfn(gpa >> PAGE_SHIFT), &p2mt); + gmfn = get_gfn_query(v->domain, _gfn(gpa >> PAGE_SHIFT), &p2mt); #if GUEST_PAGING_LEVELS == 2 smfn = shadow_hash_lookup(v, mfn_x(gmfn), SH_type_l2_32_shadow); #else smfn = shadow_hash_lookup(v, mfn_x(gmfn), SH_type_l4_64_shadow); #endif + put_gfn(v->domain, gpa >> PAGE_SHIFT); + if ( mfn_valid(smfn) ) { mfn_to_page(gmfn)->shadow_flags |= SHF_pagetable_dying; @@ -4811,15 +4839,22 @@ static mfn_t emulate_gva_to_mfn(struct v /* Translate the GFN to an MFN */ ASSERT(!paging_locked_by_me(v->domain)); - mfn = gfn_to_mfn_guest(v->domain, _gfn(gfn), &p2mt); + mfn = get_gfn_guest(v->domain, _gfn(gfn), &p2mt); if ( p2m_is_readonly(p2mt) ) + { + put_gfn(v->domain, gfn); return _mfn(READONLY_GFN); + } if ( !p2m_is_ram(p2mt) ) + { + put_gfn(v->domain, gfn); return _mfn(BAD_GFN_TO_MFN); + } ASSERT(mfn_valid(mfn)); v->arch.paging.last_write_was_pt = !!sh_mfn_is_a_page_table(mfn); + put_gfn(v->domain, gfn); return mfn; } @@ -5220,7 +5255,7 @@ int sh_audit_l1_table(struct vcpu *v, mf { gfn = guest_l1e_get_gfn(*gl1e); mfn = shadow_l1e_get_mfn(*sl1e); - gmfn = gfn_to_mfn_query(v->domain, gfn, &p2mt); + gmfn = get_gfn_query_unlocked(v->domain, gfn_x(gfn), &p2mt); if ( !p2m_is_grant(p2mt) && mfn_x(gmfn) != mfn_x(mfn) ) AUDIT_FAIL(1, "bad translation: gfn %" SH_PRI_gfn " --> %" PRI_mfn " != mfn %" PRI_mfn, @@ -5291,16 +5326,17 @@ int sh_audit_l2_table(struct vcpu *v, mf mfn = shadow_l2e_get_mfn(*sl2e); gmfn = (guest_l2e_get_flags(*gl2e) & _PAGE_PSE) ? get_fl1_shadow_status(v, gfn) - : get_shadow_status(v, gfn_to_mfn_query(v->domain, gfn, &p2mt), - SH_type_l1_shadow); + : get_shadow_status(v, + get_gfn_query_unlocked(v->domain, gfn_x(gfn), + &p2mt), SH_type_l1_shadow); if ( mfn_x(gmfn) != mfn_x(mfn) ) AUDIT_FAIL(2, "bad translation: gfn %" SH_PRI_gfn " (--> %" PRI_mfn ")" " --> %" PRI_mfn " != mfn %" PRI_mfn, gfn_x(gfn), (guest_l2e_get_flags(*gl2e) & _PAGE_PSE) ? 0 - : mfn_x(gfn_to_mfn_query(v->domain, - gfn, &p2mt)), mfn_x(gmfn), mfn_x(mfn)); + : mfn_x(get_gfn_query_unlocked(v->domain, + gfn_x(gfn), &p2mt)), mfn_x(gmfn), mfn_x(mfn)); } }); sh_unmap_domain_page(gp); @@ -5339,7 +5375,8 @@ int sh_audit_l3_table(struct vcpu *v, mf { gfn = guest_l3e_get_gfn(*gl3e); mfn = shadow_l3e_get_mfn(*sl3e); - gmfn = get_shadow_status(v, gfn_to_mfn_query(v->domain, gfn, &p2mt), + gmfn = get_shadow_status(v, get_gfn_query_unlocked( + v->domain, gfn_x(gfn), &p2mt), ((GUEST_PAGING_LEVELS == 3 || is_pv_32on64_vcpu(v)) && !shadow_mode_external(v->domain) @@ -5387,8 +5424,8 @@ int sh_audit_l4_table(struct vcpu *v, mf { gfn = guest_l4e_get_gfn(*gl4e); mfn = shadow_l4e_get_mfn(*sl4e); - gmfn = get_shadow_status(v, gfn_to_mfn_query(v->domain, - gfn, &p2mt), + gmfn = get_shadow_status(v, get_gfn_query_unlocked( + v->domain, gfn_x(gfn), &p2mt), SH_type_l3_shadow); if ( mfn_x(gmfn) != mfn_x(mfn) ) AUDIT_FAIL(4, "bad translation: gfn %" SH_PRI_gfn diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/shadow/types.h --- a/xen/arch/x86/mm/shadow/types.h +++ b/xen/arch/x86/mm/shadow/types.h @@ -191,11 +191,11 @@ static inline shadow_l4e_t shadow_l4e_fr }) #endif - /* Override gfn_to_mfn to work with gfn_t */ -#undef gfn_to_mfn_query -#define gfn_to_mfn_query(d, g, t) gfn_to_mfn_type((d), gfn_x(g), (t), p2m_query) -#undef gfn_to_mfn_guest -#define gfn_to_mfn_guest(d, g, t) gfn_to_mfn_type((d), gfn_x(g), (t), p2m_guest) + /* Override get_gfn to work with gfn_t */ +#undef get_gfn_query +#define get_gfn_query(d, g, t) get_gfn_type((d), gfn_x(g), (t), p2m_query) +#undef get_gfn_guest +#define get_gfn_guest(d, g, t) get_gfn_type((d), gfn_x(g), (t), p2m_guest) /* The shadow types needed for the various levels. */ diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/physdev.c --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -297,16 +297,20 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H break; ret = -EINVAL; - mfn = gmfn_to_mfn(current->domain, info.gmfn); + mfn = get_gfn_untyped(current->domain, info.gmfn); if ( !mfn_valid(mfn) || !get_page_and_type(mfn_to_page(mfn), v->domain, PGT_writable_page) ) + { + put_gfn(current->domain, info.gmfn); break; + } if ( cmpxchg(&v->domain->arch.pv_domain.pirq_eoi_map_mfn, 0, mfn) != 0 ) { put_page_and_type(mfn_to_page(mfn)); + put_gfn(current->domain, info.gmfn); ret = -EBUSY; break; } @@ -316,10 +320,12 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H { v->domain->arch.pv_domain.pirq_eoi_map_mfn = 0; put_page_and_type(mfn_to_page(mfn)); + put_gfn(current->domain, info.gmfn); ret = -ENOSPC; break; } + put_gfn(current->domain, info.gmfn); ret = 0; break; } diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/traps.c --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -673,11 +673,12 @@ int wrmsr_hypervisor_regs(uint32_t idx, return 0; } - mfn = gmfn_to_mfn(d, gmfn); + mfn = get_gfn_untyped(d, gmfn); if ( !mfn_valid(mfn) || !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) ) { + put_gfn(d, gmfn); gdprintk(XENLOG_WARNING, "Bad GMFN %lx (MFN %lx) to MSR %08x\n", gmfn, mfn, base + idx); @@ -689,6 +690,7 @@ int wrmsr_hypervisor_regs(uint32_t idx, unmap_domain_page(hypercall_page); put_page_and_type(mfn_to_page(mfn)); + put_gfn(d, gmfn); break; } @@ -2347,18 +2349,25 @@ static int emulate_privileged_op(struct arch_set_cr2(v, *reg); break; - case 3: /* Write CR3 */ + case 3: {/* Write CR3 */ + unsigned long mfn, gfn; domain_lock(v->domain); if ( !is_pv_32on64_vcpu(v) ) - rc = new_guest_cr3(gmfn_to_mfn(v->domain, xen_cr3_to_pfn(*reg))); + { + gfn = xen_cr3_to_pfn(*reg); #ifdef CONFIG_COMPAT - else - rc = new_guest_cr3(gmfn_to_mfn(v->domain, compat_cr3_to_pfn(*reg))); + } else { + gfn = compat_cr3_to_pfn(*reg); #endif + } + mfn = get_gfn_untyped(v->domain, gfn); + rc = new_guest_cr3(mfn); + put_gfn(v->domain, gfn); domain_unlock(v->domain); if ( rc == 0 ) /* not okay */ goto fail; break; + } case 4: /* Write CR4 */ v->arch.pv_vcpu.ctrlreg[4] = pv_guest_cr4_fixup(v, *reg); diff -r f07d4ebe5248 -r e9fd07479f68 xen/common/grant_table.c --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -110,7 +110,7 @@ static unsigned inline int max_nr_maptra #define gfn_to_mfn_private(_d, _gfn) ({ \ p2m_type_t __p2mt; \ unsigned long __x; \ - __x = mfn_x(gfn_to_mfn_unshare((_d), (_gfn), &__p2mt)); \ + __x = mfn_x(get_gfn_unshare((_d), (_gfn), &__p2mt)); \ BUG_ON(p2m_is_shared(__p2mt)); /* XXX fixme */ \ if ( !p2m_is_valid(__p2mt) ) \ __x = INVALID_MFN; \ @@ -150,10 +150,10 @@ static int __get_paged_frame(unsigned lo mfn_t mfn; if ( readonly ) - mfn = gfn_to_mfn(rd, gfn, &p2mt); + mfn = get_gfn(rd, gfn, &p2mt); else { - mfn = gfn_to_mfn_unshare(rd, gfn, &p2mt); + mfn = get_gfn_unshare(rd, gfn, &p2mt); BUG_ON(p2m_is_shared(p2mt)); /* XXX Here, and above in gfn_to_mfn_private, need to handle * XXX failure to unshare. */ @@ -164,14 +164,16 @@ static int __get_paged_frame(unsigned lo if ( p2m_is_paging(p2mt) ) { p2m_mem_paging_populate(rd, gfn); + put_gfn(rd, gfn); rc = GNTST_eagain; } } else { + put_gfn(rd, gfn); *frame = INVALID_MFN; rc = GNTST_bad_page; } #else - *frame = readonly ? gmfn_to_mfn(rd, gfn) : gfn_to_mfn_private(rd, gfn); + *frame = readonly ? get_gfn_untyped(rd, gfn) : gfn_to_mfn_private(rd, gfn); #endif return rc; @@ -468,13 +470,14 @@ __gnttab_map_grant_ref( struct domain *ld, *rd, *owner; struct vcpu *led; int handle; + unsigned long gfn = INVALID_GFN; unsigned long frame = 0, nr_gets = 0; struct page_info *pg; int rc = GNTST_okay; u32 old_pin; u32 act_pin; unsigned int cache_flags; - struct active_grant_entry *act; + struct active_grant_entry *act = NULL; struct grant_mapping *mt; grant_entry_v1_t *sha1; grant_entry_v2_t *sha2; @@ -565,7 +568,6 @@ __gnttab_map_grant_ref( if ( !act->pin ) { - unsigned long gfn; unsigned long frame; gfn = sha1 ? sha1->frame : sha2->full_page.frame; @@ -698,6 +700,7 @@ __gnttab_map_grant_ref( op->handle = handle; op->status = GNTST_okay; + put_gfn(rd, gfn); rcu_unlock_domain(rd); return; @@ -735,6 +738,8 @@ __gnttab_map_grant_ref( gnttab_clear_flag(_GTF_reading, status); unlock_out: + if ( gfn != INVALID_GFN ) + put_gfn(rd, gfn); spin_unlock(&rd->grant_table->lock); op->status = rc; put_maptrack_handle(ld->grant_table, handle); @@ -1475,6 +1480,7 @@ gnttab_transfer( /* Check the passed page frame for basic validity. */ if ( unlikely(!mfn_valid(mfn)) ) { + put_gfn(d, gop.mfn); gdprintk(XENLOG_INFO, "gnttab_transfer: out-of-range %lx\n", (unsigned long)gop.mfn); gop.status = GNTST_bad_page; @@ -1484,6 +1490,7 @@ gnttab_transfer( page = mfn_to_page(mfn); if ( unlikely(is_xen_heap_page(page)) ) { + put_gfn(d, gop.mfn); gdprintk(XENLOG_INFO, "gnttab_transfer: xen frame %lx\n", (unsigned long)gop.mfn); gop.status = GNTST_bad_page; @@ -1492,6 +1499,7 @@ gnttab_transfer( if ( steal_page(d, page, 0) < 0 ) { + put_gfn(d, gop.mfn); gop.status = GNTST_bad_page; goto copyback; } @@ -1504,6 +1512,7 @@ gnttab_transfer( /* Find the target domain. */ if ( unlikely((e = rcu_lock_domain_by_id(gop.domid)) == NULL) ) { + put_gfn(d, gop.mfn); gdprintk(XENLOG_INFO, "gnttab_transfer: can''t find domain %d\n", gop.domid); page->count_info &= ~(PGC_count_mask|PGC_allocated); @@ -1514,6 +1523,7 @@ gnttab_transfer( if ( xsm_grant_transfer(d, e) ) { + put_gfn(d, gop.mfn); gop.status = GNTST_permission_denied; unlock_and_copyback: rcu_unlock_domain(e); @@ -1566,6 +1576,7 @@ gnttab_transfer( e->tot_pages, e->max_pages, gop.ref, e->is_dying); spin_unlock(&e->page_alloc_lock); rcu_unlock_domain(e); + put_gfn(d, gop.mfn); page->count_info &= ~(PGC_count_mask|PGC_allocated); free_domheap_page(page); gop.status = GNTST_general_error; @@ -1579,6 +1590,7 @@ gnttab_transfer( page_set_owner(page, e); spin_unlock(&e->page_alloc_lock); + put_gfn(d, gop.mfn); TRACE_1D(TRC_MEM_PAGE_GRANT_TRANSFER, e->domain_id); @@ -1850,6 +1862,8 @@ __acquire_grant_for_copy( { gfn = sha1->frame; rc = __get_paged_frame(gfn, &grant_frame, readonly, rd); + /* We drop this immediately per the comments at the top */ + put_gfn(rd, gfn); if ( rc != GNTST_okay ) goto unlock_out; act->gfn = gfn; @@ -1862,6 +1876,7 @@ __acquire_grant_for_copy( { gfn = sha2->full_page.frame; rc = __get_paged_frame(gfn, &grant_frame, readonly, rd); + put_gfn(rd, gfn); if ( rc != GNTST_okay ) goto unlock_out; act->gfn = gfn; @@ -1874,6 +1889,7 @@ __acquire_grant_for_copy( { gfn = sha2->sub_page.frame; rc = __get_paged_frame(gfn, &grant_frame, readonly, rd); + put_gfn(rd, gfn); if ( rc != GNTST_okay ) goto unlock_out; act->gfn = gfn; @@ -1973,6 +1989,7 @@ __gnttab_copy( { #ifdef CONFIG_X86 rc = __get_paged_frame(op->source.u.gmfn, &s_frame, 1, sd); + put_gfn(sd, op->source.u.gmfn); if ( rc != GNTST_okay ) goto error_out; #else @@ -2012,6 +2029,7 @@ __gnttab_copy( { #ifdef CONFIG_X86 rc = __get_paged_frame(op->dest.u.gmfn, &d_frame, 0, dd); + put_gfn(dd, op->dest.u.gmfn); if ( rc != GNTST_okay ) goto error_out; #else diff -r f07d4ebe5248 -r e9fd07479f68 xen/common/memory.c --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -162,11 +162,12 @@ int guest_remove_page(struct domain *d, unsigned long mfn; #ifdef CONFIG_X86 - mfn = mfn_x(gfn_to_mfn(d, gmfn, &p2mt)); + mfn = mfn_x(get_gfn(d, gmfn, &p2mt)); if ( unlikely(p2m_is_paging(p2mt)) ) { guest_physmap_remove_page(d, gmfn, mfn, 0); p2m_mem_paging_drop_page(d, gmfn); + put_gfn(d, gmfn); return 1; } #else @@ -174,6 +175,7 @@ int guest_remove_page(struct domain *d, #endif if ( unlikely(!mfn_valid(mfn)) ) { + put_gfn(d, gmfn); gdprintk(XENLOG_INFO, "Domain %u page number %lx invalid\n", d->domain_id, gmfn); return 0; @@ -187,12 +189,14 @@ int guest_remove_page(struct domain *d, { put_page_and_type(page); guest_physmap_remove_page(d, gmfn, mfn, 0); + put_gfn(d, gmfn); return 1; } #endif /* CONFIG_X86 */ if ( unlikely(!get_page(page, d)) ) { + put_gfn(d, gmfn); gdprintk(XENLOG_INFO, "Bad page free for domain %u\n", d->domain_id); return 0; } @@ -204,6 +208,7 @@ int guest_remove_page(struct domain *d, put_page(page); guest_physmap_remove_page(d, gmfn, mfn, 0); + put_gfn(d, gmfn); put_page(page); @@ -363,9 +368,10 @@ static long memory_exchange(XEN_GUEST_HA p2m_type_t p2mt; /* Shared pages cannot be exchanged */ - mfn = mfn_x(gfn_to_mfn_unshare(d, gmfn + k, &p2mt)); + mfn = mfn_x(get_gfn_unshare(d, gmfn + k, &p2mt)); if ( p2m_is_shared(p2mt) ) { + put_gfn(d, gmfn + k); rc = -ENOMEM; goto fail; } @@ -374,6 +380,7 @@ static long memory_exchange(XEN_GUEST_HA #endif if ( unlikely(!mfn_valid(mfn)) ) { + put_gfn(d, gmfn + k); rc = -EINVAL; goto fail; } @@ -382,11 +389,13 @@ static long memory_exchange(XEN_GUEST_HA if ( unlikely(steal_page(d, page, MEMF_no_refcount)) ) { + put_gfn(d, gmfn + k); rc = -EINVAL; goto fail; } page_list_add(page, &in_chunk_list); + put_gfn(d, gmfn + k); } } diff -r f07d4ebe5248 -r e9fd07479f68 xen/common/tmem_xen.c --- a/xen/common/tmem_xen.c +++ b/xen/common/tmem_xen.c @@ -109,22 +109,28 @@ static inline void *cli_get_page(tmem_cl struct page_info *page; int ret; - cli_mfn = mfn_x(gfn_to_mfn(current->domain, cmfn, &t)); + cli_mfn = mfn_x(get_gfn(current->domain, cmfn, &t)); if ( t != p2m_ram_rw || !mfn_valid(cli_mfn) ) + { + put_gfn(current->domain, (unsigned long) cmfn); return NULL; + } page = mfn_to_page(cli_mfn); if ( cli_write ) ret = get_page_and_type(page, current->domain, PGT_writable_page); else ret = get_page(page, current->domain); if ( !ret ) + { + put_gfn(current->domain, (unsigned long) cmfn); return NULL; + } *pcli_mfn = cli_mfn; *pcli_pfp = (pfp_t *)page; return map_domain_page(cli_mfn); } -static inline void cli_put_page(void *cli_va, pfp_t *cli_pfp, +static inline void cli_put_page(tmem_cli_mfn_t cmfn, void *cli_va, pfp_t *cli_pfp, unsigned long cli_mfn, bool_t mark_dirty) { if ( mark_dirty ) @@ -135,6 +141,7 @@ static inline void cli_put_page(void *cl else put_page((struct page_info *)cli_pfp); unmap_domain_page(cli_va); + put_gfn(current->domain, (unsigned long) cmfn); } #endif @@ -169,7 +176,7 @@ EXPORT int tmh_copy_from_client(pfp_t *p (pfn_offset+len <= PAGE_SIZE) ) memcpy((char *)tmem_va+tmem_offset,(char *)cli_va+pfn_offset,len); if ( !tmemc ) - cli_put_page(cli_va, cli_pfp, cli_mfn, 0); + cli_put_page(cmfn, cli_va, cli_pfp, cli_mfn, 0); unmap_domain_page(tmem_va); return 1; } @@ -197,7 +204,7 @@ EXPORT int tmh_compress_from_client(tmem ASSERT(ret == LZO_E_OK); *out_va = dmem; if ( !tmemc ) - cli_put_page(cli_va, cli_pfp, cli_mfn, 0); + cli_put_page(cmfn, cli_va, cli_pfp, cli_mfn, 0); unmap_domain_page(cli_va); return 1; } @@ -225,7 +232,7 @@ EXPORT int tmh_copy_to_client(tmem_cli_m memcpy((char *)cli_va+pfn_offset,(char *)tmem_va+tmem_offset,len); unmap_domain_page(tmem_va); if ( !tmemc ) - cli_put_page(cli_va, cli_pfp, cli_mfn, 1); + cli_put_page(cmfn, cli_va, cli_pfp, cli_mfn, 1); mb(); return 1; } @@ -249,7 +256,7 @@ EXPORT int tmh_decompress_to_client(tmem ASSERT(ret == LZO_E_OK); ASSERT(out_len == PAGE_SIZE); if ( !tmemc ) - cli_put_page(cli_va, cli_pfp, cli_mfn, 1); + cli_put_page(cmfn, cli_va, cli_pfp, cli_mfn, 1); mb(); return 1; } @@ -271,7 +278,7 @@ EXPORT int tmh_copy_tze_to_client(tmem_c memcpy((char *)cli_va,(char *)tmem_va,len); if ( len < PAGE_SIZE ) memset((char *)cli_va+len,0,PAGE_SIZE-len); - cli_put_page(cli_va, cli_pfp, cli_mfn, 1); + cli_put_page(cmfn, cli_va, cli_pfp, cli_mfn, 1); mb(); return 1; } diff -r f07d4ebe5248 -r e9fd07479f68 xen/include/asm-ia64/mm.h --- a/xen/include/asm-ia64/mm.h +++ b/xen/include/asm-ia64/mm.h @@ -549,6 +549,8 @@ extern u64 translate_domain_pte(u64 ptev #define gmfn_to_mfn(_d, gpfn) \ gmfn_to_mfn_foreign((_d), (gpfn)) +#define put_gfn(d, g) ((void)0) + #define __gpfn_invalid(_d, gpfn) \ (lookup_domain_mpa((_d), ((gpfn)<<PAGE_SHIFT), NULL) == INVALID_MFN) diff -r f07d4ebe5248 -r e9fd07479f68 xen/include/asm-x86/guest_pt.h --- a/xen/include/asm-x86/guest_pt.h +++ b/xen/include/asm-x86/guest_pt.h @@ -51,9 +51,9 @@ gfn_to_paddr(gfn_t gfn) return ((paddr_t)gfn_x(gfn)) << PAGE_SHIFT; } -/* Override gfn_to_mfn to work with gfn_t */ -#undef gfn_to_mfn -#define gfn_to_mfn(d, g, t) gfn_to_mfn_type((d), gfn_x(g), (t), p2m_alloc) +/* Override get_gfn to work with gfn_t */ +#undef get_gfn +#define get_gfn(d, g, t) get_gfn_type((d), gfn_x(g), (t), p2m_alloc) /* Types of the guest''s page tables and access functions for them */ diff -r f07d4ebe5248 -r e9fd07479f68 xen/include/asm-x86/hvm/hvm.h --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -394,7 +394,10 @@ int hvm_virtual_to_linear_addr( void *hvm_map_guest_frame_rw(unsigned long gfn); void *hvm_map_guest_frame_ro(unsigned long gfn); -void hvm_unmap_guest_frame(void *p); +/* We pass back either the guest virtual or physical frame mapped, + * in order to drop any locks/refcounts we may have had on p2m + * entries or underlying mfn''s while using the map */ +void hvm_unmap_guest_frame(void *p, unsigned long addr, int is_va); static inline void hvm_set_info_guest(struct vcpu *v) { diff -r f07d4ebe5248 -r e9fd07479f68 xen/include/asm-x86/hvm/vmx/vvmx.h --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -25,6 +25,7 @@ struct nestedvmx { paddr_t vmxon_region_pa; + unsigned long iobitmap_gfn[2]; void *iobitmap[2]; /* map (va) of L1 guest I/O bitmap */ /* deferred nested interrupt */ struct { diff -r f07d4ebe5248 -r e9fd07479f68 xen/include/asm-x86/p2m.h --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -305,9 +305,12 @@ struct p2m_domain *p2m_get_p2m(struct vc #define p2m_get_pagetable(p2m) ((p2m)->phys_table) +/**** p2m query accessors. After calling any of the variants below, you + * need to call put_gfn(domain, gfn). If you don''t, you''ll lock the + * hypervisor. ****/ /* Read a particular P2M table, mapping pages as we go. Most callers - * should _not_ call this directly; use the other gfn_to_mfn_* functions + * should _not_ call this directly; use the other get_gfn* functions * below unless you know you want to walk a p2m that isn''t a domain''s * main one. * If the lookup succeeds, the return value is != INVALID_MFN and @@ -318,7 +321,7 @@ mfn_t gfn_to_mfn_type_p2m(struct p2m_dom unsigned int *page_order); /* General conversion function from gfn to mfn */ -static inline mfn_t gfn_to_mfn_type(struct domain *d, +static inline mfn_t get_gfn_type(struct domain *d, unsigned long gfn, p2m_type_t *t, p2m_query_t q) { @@ -327,25 +330,39 @@ static inline mfn_t gfn_to_mfn_type(stru } /* Syntactic sugar: most callers will use one of these. - * N.B. gfn_to_mfn_query() is the _only_ one guaranteed not to take the + * N.B. get_gfn_query() is the _only_ one guaranteed not to take the * p2m lock; none of the others can be called with the p2m or paging * lock held. */ -#define gfn_to_mfn(d, g, t) gfn_to_mfn_type((d), (g), (t), p2m_alloc) -#define gfn_to_mfn_query(d, g, t) gfn_to_mfn_type((d), (g), (t), p2m_query) -#define gfn_to_mfn_guest(d, g, t) gfn_to_mfn_type((d), (g), (t), p2m_guest) -#define gfn_to_mfn_unshare(d, g, t) gfn_to_mfn_type((d), (g), (t), p2m_unshare) +#define get_gfn(d, g, t) get_gfn_type((d), (g), (t), p2m_alloc) +#define get_gfn_query(d, g, t) get_gfn_type((d), (g), (t), p2m_query) +#define get_gfn_guest(d, g, t) get_gfn_type((d), (g), (t), p2m_guest) +#define get_gfn_unshare(d, g, t) get_gfn_type((d), (g), (t), p2m_unshare) /* Compatibility function exporting the old untyped interface */ -static inline unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn) +static inline unsigned long get_gfn_untyped(struct domain *d, unsigned long gpfn) { mfn_t mfn; p2m_type_t t; - mfn = gfn_to_mfn(d, gpfn, &t); + mfn = get_gfn(d, gpfn, &t); if ( p2m_is_valid(t) ) return mfn_x(mfn); return INVALID_MFN; } +/* This is a noop for now. */ +static inline void put_gfn(struct domain *d, unsigned long gfn) +{ +} + +/* These are identical for now. The intent is to have the caller not worry + * about put_gfn. To only be used in printk''s, crash situations, or to + * peek at a type. You''re not holding the p2m entry exclsively after calling + * this. */ +#define get_gfn_unlocked(d, g, t) get_gfn_type((d), (g), (t), p2m_alloc) +#define get_gfn_query_unlocked(d, g, t) get_gfn_type((d), (g), (t), p2m_query) +#define get_gfn_guest_unlocked(d, g, t) get_gfn_type((d), (g), (t), p2m_guest) +#define get_gfn_unshare_unlocked(d, g, t) get_gfn_type((d), (g), (t), p2m_unshare) + /* General conversion function from mfn to gfn */ static inline unsigned long mfn_to_gfn(struct domain *d, mfn_t mfn) { @@ -529,7 +546,8 @@ static inline int p2m_gfn_check_limit( #define p2m_gfn_check_limit(d, g, o) 0 #endif -/* Directly set a p2m entry: only for use by p2m code */ +/* Directly set a p2m entry: only for use by p2m code. Does not need + * a call to put_gfn afterwards/ */ int set_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Nov-10 12:26 UTC
Re: [Xen-devel] [PATCH 5 of 5] Modify naming of queries into the p2m
Hi, This mostly looks good; I have a few comments on the detail below. If those are addressed I hope I can apply the next version. Any other maintainers want to object to this renaming? At 22:28 -0500 on 07 Nov (1320704913), Andres Lagar-Cavilla wrote:> @@ -1182,6 +1195,7 @@ int hvm_hap_nested_page_fault(unsigned l > mfn_t mfn; > struct vcpu *v = current; > struct p2m_domain *p2m; > + int rc; > > /* On Nested Virtualization, walk the guest page table. > * If this succeeds, all is fine. > @@ -1255,8 +1269,8 @@ int hvm_hap_nested_page_fault(unsigned l > if ( violation ) > { > p2m_mem_access_check(gpa, gla_valid, gla, access_r, access_w, access_x); > - > - return 1; > + rc = 1; > + goto out_put_gfn;Shouldn''t this patch be changing the call to gfn_to_mfn_type_p2m() just above here to a get_gfn_*() call too?> -void hvm_unmap_guest_frame(void *p) > +void hvm_unmap_guest_frame(void *p, unsigned long addr, int is_va) > { > + /* We enter this function with a map obtained in __hvm_map_guest_frame. > + * This map performed a p2m query that locked the gfn entry and got > + * a ref on the mfn. Must undo */ > if ( p ) > + { > + unsigned long gfn = INVALID_GFN; > + > + if ( is_va ) > + { > + if ( addr ) > + { > + uint32_t pfec = PFEC_page_present; > + gfn = paging_gva_to_gfn(current, addr, &pfec); > + } else { > + gfn = INVALID_GFN; > + } > + } else { > + gfn = addr; > + } > + > + if ( gfn != INVALID_GFN ) > + put_gfn(current->domain, gfn); > + > unmap_domain_page(p); > + } > }This is a pretty wierd-looking function now - I think it would be better just to make all callers have to remember the GFN. In fact, since a guest VCPU can change its pagetables at any time, it''s not safe to use paging_gva_to_gfn() to regenerate the GFN from a VA. Also, IIRC the nested-SVM code keeps the hvm_map mapping around for quite a long time so in fact this unmap-and-put-gfn may not be the right interface. Maybe the caller should put_gfn() explicitly.> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/svm/svm.c > --- a/xen/arch/x86/hvm/svm/svm.c > +++ b/xen/arch/x86/hvm/svm/svm.c > @@ -244,9 +244,10 @@ static int svm_vmcb_restore(struct vcpu > { > if ( c->cr0 & X86_CR0_PG ) > { > - mfn = mfn_x(gfn_to_mfn(v->domain, c->cr3 >> PAGE_SHIFT, &p2mt)); > + mfn = mfn_x(get_gfn(v->domain, c->cr3 >> PAGE_SHIFT, &p2mt)); > if ( !p2m_is_ram(p2mt) || !get_page(mfn_to_page(mfn), v->domain) ) > { > + put_gfn(v->domain, c->cr3 >> PAGE_SHIFT); > gdprintk(XENLOG_ERR, "Invalid CR3 value=0x%"PRIx64"\n", > c->cr3); > return -EINVAL; > @@ -257,6 +258,8 @@ static int svm_vmcb_restore(struct vcpu > put_page(pagetable_get_page(v->arch.guest_table)); > > v->arch.guest_table = pagetable_from_pfn(mfn); > + if ( c->cr0 & X86_CR0_PG ) > + put_gfn(v->domain, c->cr3 >> PAGE_SHIFT); > } > > v->arch.hvm_vcpu.guest_cr[0] = c->cr0 | X86_CR0_ET; > @@ -1144,7 +1147,6 @@ static void svm_do_nested_pgfault(struct > unsigned long gfn = gpa >> PAGE_SHIFT; > mfn_t mfn; > p2m_type_t p2mt; > - p2m_access_t p2ma; > struct p2m_domain *p2m = NULL; > > ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0); > @@ -1161,7 +1163,8 @@ static void svm_do_nested_pgfault(struct > p2m = p2m_get_p2m(v); > _d.gpa = gpa; > _d.qualification = 0; > - _d.mfn = mfn_x(gfn_to_mfn_type_p2m(p2m, gfn, &_d.p2mt, &p2ma, p2m_query, NULL)); > + mfn = get_gfn_query_unlocked(p2m->domain, gfn, &_d.p2mt); > + _d.mfn = mfn_x(mfn);Ah - this is not quite the same thing. This lookup uses a specific p2m table (possibly a nested-p2m table reflecting the fact the guest is running in nested SVM mode) so it can''t be replaced by a call that just takes the domain pointer (and will internally use the domain''s master p2m table). The naming of ''p2m_get_p2m()'' is particularly unhelpful here, I realise. It fetches the running p2m, i.e. the one that hardware sees as an NPT table; almost everywhere else uses p2m_get_hostp2m(), which fetches the master p2m. But in general when you see the gfn_to_mfn_*_p2m() functions, that take an explicit p2m pointer, you need to be careful.> > __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d); > } > @@ -1181,7 +1184,7 @@ static void svm_do_nested_pgfault(struct > if ( p2m == NULL ) > p2m = p2m_get_p2m(v); > /* Everything else is an error. */ > - mfn = gfn_to_mfn_type_p2m(p2m, gfn, &p2mt, &p2ma, p2m_guest, NULL); > + mfn = get_gfn_guest_unlocked(p2m->domain, gfn, &p2mt);Likewise here.> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/guest_walk.c > --- a/xen/arch/x86/mm/guest_walk.c > +++ b/xen/arch/x86/mm/guest_walk.c > @@ -86,30 +86,33 @@ static uint32_t set_ad_bits(void *guest_ > return 0; > } > > +/* If the map is non-NULL, we leave this function having > + * called get_gfn, you need to call put_gfn. */ > static inline void *map_domain_gfn(struct p2m_domain *p2m, > gfn_t gfn, > mfn_t *mfn, > p2m_type_t *p2mt, > uint32_t *rc) > { > - p2m_access_t a; > - > /* Translate the gfn, unsharing if shared */ > - *mfn = gfn_to_mfn_type_p2m(p2m, gfn_x(gfn), p2mt, &a, p2m_unshare, NULL); > + *mfn = get_gfn_unshare(p2m->domain, gfn_x(gfn), p2mt);Here''s another case where we need to handle the nested-hap path; the p2m pointer here must be propagated into the lookup function.> if ( p2m_is_paging(*p2mt) ) > { > ASSERT(!p2m_is_nestedp2m(p2m)); > p2m_mem_paging_populate(p2m->domain, gfn_x(gfn)); > + put_gfn(p2m->domain, gfn_x(gfn)); > *rc = _PAGE_PAGED; > return NULL; > } > if ( p2m_is_shared(*p2mt) ) > { > + put_gfn(p2m->domain, gfn_x(gfn)); > *rc = _PAGE_SHARED; > return NULL; > } > if ( !p2m_is_ram(*p2mt) ) > { > + put_gfn(p2m->domain, gfn_x(gfn)); > *rc |= _PAGE_PRESENT; > return NULL; > } > @@ -120,6 +123,9 @@ static inline void *map_domain_gfn(struc > > > /* Walk the guest pagetables, after the manner of a hardware walker. */ > +/* Because the walk is essentially random, it can cause a deadlock > + * warning in the p2m locking code. Highly unlikely this is an actual > + * deadlock, because who would walk page table in the opposite order? */Linear pagetables make this sort of thing more likely, especially if they''re used by one process to update another process''s mappings. If this is a problem, maybe we''ll need to avoid the deadlock either by allowing multiple lock-holders in the p2m or by rearranging the a/d writeback soit does another p2m lookup -- I''d rather not do that, though, since even on 64-bit it will add a lot of memory accesses. I''m happy to take a version of this big switchover patch that doesn''t solve this problem, though. It can be sorted out in a separate patch.> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/hap/nested_hap.c > --- a/xen/arch/x86/mm/hap/nested_hap.c > +++ b/xen/arch/x86/mm/hap/nested_hap.c > @@ -146,22 +146,29 @@ nestedhap_walk_L0_p2m(struct p2m_domain > mfn_t mfn; > p2m_type_t p2mt; > p2m_access_t p2ma; > + int rc; > > /* walk L0 P2M table */ > mfn = gfn_to_mfn_type_p2m(p2m, L1_gpa >> PAGE_SHIFT, &p2mt, &p2ma, > p2m_query, page_order);Again, are we not changing this function''s name? Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Nov-10 12:27 UTC
Re: [Xen-devel] [PATCH 0 of 5] p2m synchronization groundwork
At 22:28 -0500 on 07 Nov (1320704908), Andres Lagar-Cavilla wrote:> This patch series lays the groundwork for improving the synchronization > of primitives accessing the p2m. > > This is a partial repost of the patches emailed previously as a RFC. > These patches are now intended for committing to the tree. > > We change the API for accessing the p2m to a family of functions > get_gfn/put_gfn. The name intends to reflect the fact that even lookups > are meant to obtain exclusive access to a p2m entry, and that said access > should be relinquished (put_gfn) when done. > > The patches, however, alter little functionality. The API name change > does not involve yet additional locking or ref-counting. They will, however, > throw a "barrier" that will force any new commits to conform to the new API. > > Patches are based off 24066:54a5e994a241. Should the new XENMEM calls be > accepted before this, the series needs to be updated to also change the API > there. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.Patches 0-4 applied, thanks. I''ve reviewed patch 5 separately. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-10 13:54 UTC
Re: [Xen-devel] [PATCH 5 of 5] Modify naming of queries into the p2m
Ok, thanks. I''ll try to address those comments asap in a new version. Off-the-bat: - sharing deadlock: I can arrange it so that shr_lock is always taken after get_gfn. That would still require either disabling deadlock detection or moving shr_lock down the deadlock order. Which one is preferable? The real solution that I have queued is removing the global hash table, and locking each shared page individually using PGT_locked. - get_mfn: the goal is additional belts-and-braces. It doesn''t require an extra argument. Put_gfn will put the mfn that sits in the p2m entry at the time of put. A few tricks are needed to handle remove_page, sharing, etc. Works for me with windows and linux hvm guests (so far) - I can rename gfn_to_mfn_type_p2m to get_gfn_type_access (since that''s what the remaining uses are for) Thanks for your feedback on the nested p2m functions, which I quite don''t get yet. Andres> Hi, > > This mostly looks good; I have a few comments on the detail below. > If those are addressed I hope I can apply the next version. > > Any other maintainers want to object to this renaming? > > At 22:28 -0500 on 07 Nov (1320704913), Andres Lagar-Cavilla wrote: >> @@ -1182,6 +1195,7 @@ int hvm_hap_nested_page_fault(unsigned l >> mfn_t mfn; >> struct vcpu *v = current; >> struct p2m_domain *p2m; >> + int rc; >> >> /* On Nested Virtualization, walk the guest page table. >> * If this succeeds, all is fine. >> @@ -1255,8 +1269,8 @@ int hvm_hap_nested_page_fault(unsigned l >> if ( violation ) >> { >> p2m_mem_access_check(gpa, gla_valid, gla, access_r, >> access_w, access_x); >> - >> - return 1; >> + rc = 1; >> + goto out_put_gfn; > > Shouldn''t this patch be changing the call to gfn_to_mfn_type_p2m() just > above here to a get_gfn_*() call too? > >> -void hvm_unmap_guest_frame(void *p) >> +void hvm_unmap_guest_frame(void *p, unsigned long addr, int is_va) >> { >> + /* We enter this function with a map obtained in >> __hvm_map_guest_frame. >> + * This map performed a p2m query that locked the gfn entry and got >> + * a ref on the mfn. Must undo */ >> if ( p ) >> + { >> + unsigned long gfn = INVALID_GFN; >> + >> + if ( is_va ) >> + { >> + if ( addr ) >> + { >> + uint32_t pfec = PFEC_page_present; >> + gfn = paging_gva_to_gfn(current, addr, &pfec); >> + } else { >> + gfn = INVALID_GFN; >> + } >> + } else { >> + gfn = addr; >> + } >> + >> + if ( gfn != INVALID_GFN ) >> + put_gfn(current->domain, gfn); >> + >> unmap_domain_page(p); >> + } >> } > > This is a pretty wierd-looking function now - I think it would be better > just to make all callers have to remember the GFN. In fact, since a > guest VCPU can change its pagetables at any time, it''s not safe to use > paging_gva_to_gfn() to regenerate the GFN from a VA. > > Also, IIRC the nested-SVM code keeps the hvm_map mapping around for > quite a long time so in fact this unmap-and-put-gfn may not be the right > interface. Maybe the caller should put_gfn() explicitly. > >> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/svm/svm.c >> --- a/xen/arch/x86/hvm/svm/svm.c >> +++ b/xen/arch/x86/hvm/svm/svm.c >> @@ -244,9 +244,10 @@ static int svm_vmcb_restore(struct vcpu >> { >> if ( c->cr0 & X86_CR0_PG ) >> { >> - mfn = mfn_x(gfn_to_mfn(v->domain, c->cr3 >> PAGE_SHIFT, >> &p2mt)); >> + mfn = mfn_x(get_gfn(v->domain, c->cr3 >> PAGE_SHIFT, >> &p2mt)); >> if ( !p2m_is_ram(p2mt) || !get_page(mfn_to_page(mfn), >> v->domain) ) >> { >> + put_gfn(v->domain, c->cr3 >> PAGE_SHIFT); >> gdprintk(XENLOG_ERR, "Invalid CR3 value=0x%"PRIx64"\n", >> c->cr3); >> return -EINVAL; >> @@ -257,6 +258,8 @@ static int svm_vmcb_restore(struct vcpu >> put_page(pagetable_get_page(v->arch.guest_table)); >> >> v->arch.guest_table = pagetable_from_pfn(mfn); >> + if ( c->cr0 & X86_CR0_PG ) >> + put_gfn(v->domain, c->cr3 >> PAGE_SHIFT); >> } >> >> v->arch.hvm_vcpu.guest_cr[0] = c->cr0 | X86_CR0_ET; >> @@ -1144,7 +1147,6 @@ static void svm_do_nested_pgfault(struct >> unsigned long gfn = gpa >> PAGE_SHIFT; >> mfn_t mfn; >> p2m_type_t p2mt; >> - p2m_access_t p2ma; >> struct p2m_domain *p2m = NULL; >> >> ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0); >> @@ -1161,7 +1163,8 @@ static void svm_do_nested_pgfault(struct >> p2m = p2m_get_p2m(v); >> _d.gpa = gpa; >> _d.qualification = 0; >> - _d.mfn = mfn_x(gfn_to_mfn_type_p2m(p2m, gfn, &_d.p2mt, &p2ma, >> p2m_query, NULL)); >> + mfn = get_gfn_query_unlocked(p2m->domain, gfn, &_d.p2mt); >> + _d.mfn = mfn_x(mfn); > > Ah - this is not quite the same thing. This lookup uses a specific p2m > table (possibly a nested-p2m table reflecting the fact the guest is > running in nested SVM mode) so it can''t be replaced by a call that just > takes the domain pointer (and will internally use the domain''s master > p2m table). > > The naming of ''p2m_get_p2m()'' is particularly unhelpful here, I realise. > It fetches the running p2m, i.e. the one that hardware sees as an NPT > table; almost everywhere else uses p2m_get_hostp2m(), which > fetches the master p2m. But in general when you see the > gfn_to_mfn_*_p2m() functions, that take an explicit p2m pointer, you > need to be careful. > >> >> __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d); >> } >> @@ -1181,7 +1184,7 @@ static void svm_do_nested_pgfault(struct >> if ( p2m == NULL ) >> p2m = p2m_get_p2m(v); >> /* Everything else is an error. */ >> - mfn = gfn_to_mfn_type_p2m(p2m, gfn, &p2mt, &p2ma, p2m_guest, NULL); >> + mfn = get_gfn_guest_unlocked(p2m->domain, gfn, &p2mt); > > Likewise here. > >> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/guest_walk.c >> --- a/xen/arch/x86/mm/guest_walk.c >> +++ b/xen/arch/x86/mm/guest_walk.c >> @@ -86,30 +86,33 @@ static uint32_t set_ad_bits(void *guest_ >> return 0; >> } >> >> +/* If the map is non-NULL, we leave this function having >> + * called get_gfn, you need to call put_gfn. */ >> static inline void *map_domain_gfn(struct p2m_domain *p2m, >> gfn_t gfn, >> mfn_t *mfn, >> p2m_type_t *p2mt, >> uint32_t *rc) >> { >> - p2m_access_t a; >> - >> /* Translate the gfn, unsharing if shared */ >> - *mfn = gfn_to_mfn_type_p2m(p2m, gfn_x(gfn), p2mt, &a, p2m_unshare, >> NULL); >> + *mfn = get_gfn_unshare(p2m->domain, gfn_x(gfn), p2mt); > > Here''s another case where we need to handle the nested-hap path; the p2m > pointer here must be propagated into the lookup function. > >> if ( p2m_is_paging(*p2mt) ) >> { >> ASSERT(!p2m_is_nestedp2m(p2m)); >> p2m_mem_paging_populate(p2m->domain, gfn_x(gfn)); >> + put_gfn(p2m->domain, gfn_x(gfn)); >> *rc = _PAGE_PAGED; >> return NULL; >> } >> if ( p2m_is_shared(*p2mt) ) >> { >> + put_gfn(p2m->domain, gfn_x(gfn)); >> *rc = _PAGE_SHARED; >> return NULL; >> } >> if ( !p2m_is_ram(*p2mt) ) >> { >> + put_gfn(p2m->domain, gfn_x(gfn)); >> *rc |= _PAGE_PRESENT; >> return NULL; >> } >> @@ -120,6 +123,9 @@ static inline void *map_domain_gfn(struc >> >> >> /* Walk the guest pagetables, after the manner of a hardware walker. */ >> +/* Because the walk is essentially random, it can cause a deadlock >> + * warning in the p2m locking code. Highly unlikely this is an actual >> + * deadlock, because who would walk page table in the opposite order? >> */ > > Linear pagetables make this sort of thing more likely, especially if > they''re used by one process to update another process''s mappings. > > If this is a problem, maybe we''ll need to avoid the deadlock either by > allowing multiple lock-holders in the p2m or by rearranging the a/d > writeback soit does another p2m lookup -- I''d rather not do that, > though, since even on 64-bit it will add a lot of memory accesses. > > I''m happy to take a version of this big switchover patch that doesn''t > solve this problem, though. It can be sorted out in a separate patch. > >> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/hap/nested_hap.c >> --- a/xen/arch/x86/mm/hap/nested_hap.c >> +++ b/xen/arch/x86/mm/hap/nested_hap.c >> @@ -146,22 +146,29 @@ nestedhap_walk_L0_p2m(struct p2m_domain >> mfn_t mfn; >> p2m_type_t p2mt; >> p2m_access_t p2ma; >> + int rc; >> >> /* walk L0 P2M table */ >> mfn = gfn_to_mfn_type_p2m(p2m, L1_gpa >> PAGE_SHIFT, &p2mt, &p2ma, >> p2m_query, page_order); > > Again, are we not changing this function''s name? > > > Cheers, > > Tim. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-10 16:47 UTC
Re: [Xen-devel] [PATCH 5 of 5] Modify naming of queries into the p2m
Tim, see below re nested-SVM code:> Hi, > > This mostly looks good; I have a few comments on the detail below. > If those are addressed I hope I can apply the next version. > > Any other maintainers want to object to this renaming? > > At 22:28 -0500 on 07 Nov (1320704913), Andres Lagar-Cavilla wrote: >> @@ -1182,6 +1195,7 @@ int hvm_hap_nested_page_fault(unsigned l >> mfn_t mfn; >> struct vcpu *v = current; >> struct p2m_domain *p2m; >> + int rc; >> >> /* On Nested Virtualization, walk the guest page table. >> * If this succeeds, all is fine. >> @@ -1255,8 +1269,8 @@ int hvm_hap_nested_page_fault(unsigned l >> if ( violation ) >> { >> p2m_mem_access_check(gpa, gla_valid, gla, access_r, >> access_w, access_x); >> - >> - return 1; >> + rc = 1; >> + goto out_put_gfn; > > Shouldn''t this patch be changing the call to gfn_to_mfn_type_p2m() just > above here to a get_gfn_*() call too? > >> -void hvm_unmap_guest_frame(void *p) >> +void hvm_unmap_guest_frame(void *p, unsigned long addr, int is_va) >> { >> + /* We enter this function with a map obtained in >> __hvm_map_guest_frame. >> + * This map performed a p2m query that locked the gfn entry and got >> + * a ref on the mfn. Must undo */ >> if ( p ) >> + { >> + unsigned long gfn = INVALID_GFN; >> + >> + if ( is_va ) >> + { >> + if ( addr ) >> + { >> + uint32_t pfec = PFEC_page_present; >> + gfn = paging_gva_to_gfn(current, addr, &pfec); >> + } else { >> + gfn = INVALID_GFN; >> + } >> + } else { >> + gfn = addr; >> + } >> + >> + if ( gfn != INVALID_GFN ) >> + put_gfn(current->domain, gfn); >> + >> unmap_domain_page(p); >> + } >> } > > This is a pretty wierd-looking function now - I think it would be better > just to make all callers have to remember the GFN. In fact, since a > guest VCPU can change its pagetables at any time, it''s not safe to use > paging_gva_to_gfn() to regenerate the GFN from a VA. > > Also, IIRC the nested-SVM code keeps the hvm_map mapping around for > quite a long time so in fact this unmap-and-put-gfn may not be the right > interface. Maybe the caller should put_gfn() explicitly.I just looked into nestedsvm_vmcb_map. It would seem that the map it stores in nv_vvmcx survives beyond the vmexit. So, if we don''t put_gfn we would be entering the scheduler in an atomic context. The shorthand solution is to put_gfn, but that would render the map potentially unsafe on re-entry. I don''t think so, so I''ll push forward with that. Is it possible to teardown and re-establish the maps on each hypervisor entry/exit? That would seem like the ultimate solution. Andres> >> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/hvm/svm/svm.c >> --- a/xen/arch/x86/hvm/svm/svm.c >> +++ b/xen/arch/x86/hvm/svm/svm.c >> @@ -244,9 +244,10 @@ static int svm_vmcb_restore(struct vcpu >> { >> if ( c->cr0 & X86_CR0_PG ) >> { >> - mfn = mfn_x(gfn_to_mfn(v->domain, c->cr3 >> PAGE_SHIFT, >> &p2mt)); >> + mfn = mfn_x(get_gfn(v->domain, c->cr3 >> PAGE_SHIFT, >> &p2mt)); >> if ( !p2m_is_ram(p2mt) || !get_page(mfn_to_page(mfn), >> v->domain) ) >> { >> + put_gfn(v->domain, c->cr3 >> PAGE_SHIFT); >> gdprintk(XENLOG_ERR, "Invalid CR3 value=0x%"PRIx64"\n", >> c->cr3); >> return -EINVAL; >> @@ -257,6 +258,8 @@ static int svm_vmcb_restore(struct vcpu >> put_page(pagetable_get_page(v->arch.guest_table)); >> >> v->arch.guest_table = pagetable_from_pfn(mfn); >> + if ( c->cr0 & X86_CR0_PG ) >> + put_gfn(v->domain, c->cr3 >> PAGE_SHIFT); >> } >> >> v->arch.hvm_vcpu.guest_cr[0] = c->cr0 | X86_CR0_ET; >> @@ -1144,7 +1147,6 @@ static void svm_do_nested_pgfault(struct >> unsigned long gfn = gpa >> PAGE_SHIFT; >> mfn_t mfn; >> p2m_type_t p2mt; >> - p2m_access_t p2ma; >> struct p2m_domain *p2m = NULL; >> >> ret = hvm_hap_nested_page_fault(gpa, 0, ~0ul, 0, 0, 0, 0); >> @@ -1161,7 +1163,8 @@ static void svm_do_nested_pgfault(struct >> p2m = p2m_get_p2m(v); >> _d.gpa = gpa; >> _d.qualification = 0; >> - _d.mfn = mfn_x(gfn_to_mfn_type_p2m(p2m, gfn, &_d.p2mt, &p2ma, >> p2m_query, NULL)); >> + mfn = get_gfn_query_unlocked(p2m->domain, gfn, &_d.p2mt); >> + _d.mfn = mfn_x(mfn); > > Ah - this is not quite the same thing. This lookup uses a specific p2m > table (possibly a nested-p2m table reflecting the fact the guest is > running in nested SVM mode) so it can''t be replaced by a call that just > takes the domain pointer (and will internally use the domain''s master > p2m table). > > The naming of ''p2m_get_p2m()'' is particularly unhelpful here, I realise. > It fetches the running p2m, i.e. the one that hardware sees as an NPT > table; almost everywhere else uses p2m_get_hostp2m(), which > fetches the master p2m. But in general when you see the > gfn_to_mfn_*_p2m() functions, that take an explicit p2m pointer, you > need to be careful. > >> >> __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d); >> } >> @@ -1181,7 +1184,7 @@ static void svm_do_nested_pgfault(struct >> if ( p2m == NULL ) >> p2m = p2m_get_p2m(v); >> /* Everything else is an error. */ >> - mfn = gfn_to_mfn_type_p2m(p2m, gfn, &p2mt, &p2ma, p2m_guest, NULL); >> + mfn = get_gfn_guest_unlocked(p2m->domain, gfn, &p2mt); > > Likewise here. > >> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/guest_walk.c >> --- a/xen/arch/x86/mm/guest_walk.c >> +++ b/xen/arch/x86/mm/guest_walk.c >> @@ -86,30 +86,33 @@ static uint32_t set_ad_bits(void *guest_ >> return 0; >> } >> >> +/* If the map is non-NULL, we leave this function having >> + * called get_gfn, you need to call put_gfn. */ >> static inline void *map_domain_gfn(struct p2m_domain *p2m, >> gfn_t gfn, >> mfn_t *mfn, >> p2m_type_t *p2mt, >> uint32_t *rc) >> { >> - p2m_access_t a; >> - >> /* Translate the gfn, unsharing if shared */ >> - *mfn = gfn_to_mfn_type_p2m(p2m, gfn_x(gfn), p2mt, &a, p2m_unshare, >> NULL); >> + *mfn = get_gfn_unshare(p2m->domain, gfn_x(gfn), p2mt); > > Here''s another case where we need to handle the nested-hap path; the p2m > pointer here must be propagated into the lookup function. > >> if ( p2m_is_paging(*p2mt) ) >> { >> ASSERT(!p2m_is_nestedp2m(p2m)); >> p2m_mem_paging_populate(p2m->domain, gfn_x(gfn)); >> + put_gfn(p2m->domain, gfn_x(gfn)); >> *rc = _PAGE_PAGED; >> return NULL; >> } >> if ( p2m_is_shared(*p2mt) ) >> { >> + put_gfn(p2m->domain, gfn_x(gfn)); >> *rc = _PAGE_SHARED; >> return NULL; >> } >> if ( !p2m_is_ram(*p2mt) ) >> { >> + put_gfn(p2m->domain, gfn_x(gfn)); >> *rc |= _PAGE_PRESENT; >> return NULL; >> } >> @@ -120,6 +123,9 @@ static inline void *map_domain_gfn(struc >> >> >> /* Walk the guest pagetables, after the manner of a hardware walker. */ >> +/* Because the walk is essentially random, it can cause a deadlock >> + * warning in the p2m locking code. Highly unlikely this is an actual >> + * deadlock, because who would walk page table in the opposite order? >> */ > > Linear pagetables make this sort of thing more likely, especially if > they''re used by one process to update another process''s mappings. > > If this is a problem, maybe we''ll need to avoid the deadlock either by > allowing multiple lock-holders in the p2m or by rearranging the a/d > writeback soit does another p2m lookup -- I''d rather not do that, > though, since even on 64-bit it will add a lot of memory accesses. > > I''m happy to take a version of this big switchover patch that doesn''t > solve this problem, though. It can be sorted out in a separate patch. > >> diff -r f07d4ebe5248 -r e9fd07479f68 xen/arch/x86/mm/hap/nested_hap.c >> --- a/xen/arch/x86/mm/hap/nested_hap.c >> +++ b/xen/arch/x86/mm/hap/nested_hap.c >> @@ -146,22 +146,29 @@ nestedhap_walk_L0_p2m(struct p2m_domain >> mfn_t mfn; >> p2m_type_t p2mt; >> p2m_access_t p2ma; >> + int rc; >> >> /* walk L0 P2M table */ >> mfn = gfn_to_mfn_type_p2m(p2m, L1_gpa >> PAGE_SHIFT, &p2mt, &p2ma, >> p2m_query, page_order); > > Again, are we not changing this function''s name? > > > Cheers, > > Tim. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel