Andres Lagar-Cavilla
2011-Nov-08  21:42 UTC
[Xen-devel] [PATCH 0 of 3] p2m synchronization second part
In this patch series we add actual synchronization. We first refine the API update posted previously (I can fold the previous and this first patch together, keep separate, whatever) Then, we make p2m-lookups actually lock the p2m. For now it''s still the global p2m lock. It can be eventually made more fine grained. Finally, we ensure that within a get_gfn/put_gfn critical section, the caller has an additional ref on the underlying mfn. This requires some trickery for manipulations that remove the mfn or swap it (sharing). Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> xen/arch/x86/hvm/hvm.c | 3 +- xen/arch/x86/mm/p2m.c | 8 +++--- xen/common/grant_table.c | 2 +- xen/common/memory.c | 6 +++- xen/arch/x86/mm/mm-locks.h | 13 +++++---- xen/arch/x86/mm/p2m.c | 18 +++++++++++++- xen/include/asm-x86/p2m.h | 39 ++++++++++++++++++----------- xen/arch/x86/mm.c | 18 +++++++++---- xen/arch/x86/mm/mem_sharing.c | 13 +++------ xen/arch/x86/mm/p2m.c | 56 +++++++++++++++++++++++++++++++++++++++++- xen/common/grant_table.c | 4 +- xen/common/memory.c | 10 +++--- xen/include/asm-x86/mm.h | 3 +- xen/include/asm-x86/p2m.h | 10 ++++++- xen/include/xen/paging.h | 2 +- xen/include/xen/tmem_xen.h | 2 +- 16 files changed, 148 insertions(+), 59 deletions(-) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-08  21:42 UTC
[Xen-devel] [PATCH 1 of 3] Improvements over API change for p2m lookups
xen/arch/x86/hvm/hvm.c   |  3 +--
 xen/arch/x86/mm/p2m.c    |  8 ++++----
 xen/common/grant_table.c |  2 +-
 xen/common/memory.c      |  6 +++++-
 4 files changed, 11 insertions(+), 8 deletions(-)
Ranging from the cosmetic to actual bug fixing.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
diff -r 390d6dc6c34b -r a0c55cc5d696 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1791,8 +1791,7 @@ int hvm_virtual_to_linear_addr(
     return 0;
 }
 
-/* We leave this function holding a lock on the p2m entry and a ref
- * on the mapped mfn */
+/* We leave this function holding a lock on the p2m entry */
 static void *__hvm_map_guest_frame(unsigned long gfn, bool_t writable)
 {
     unsigned long mfn;
diff -r 390d6dc6c34b -r a0c55cc5d696 xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -349,21 +349,21 @@ void p2m_teardown(struct p2m_domain *p2m
     if (p2m == NULL)
         return;
 
+    p2m_lock(p2m);
+
 #ifdef __x86_64__
     for ( gfn=0; gfn < p2m->max_mapped_pfn; gfn++ )
     {
-        mfn = get_gfn_query(d, gfn, &t);
+        p2m_access_t a;
+        mfn = p2m->get_entry(p2m, gfn, &t, &a, p2m_query, NULL);
         if ( mfn_valid(mfn) && (t == p2m_ram_shared) )
         {
             ASSERT(!p2m_is_nestedp2m(p2m));
             BUG_ON(mem_sharing_unshare_page(d, gfn, MEM_SHARING_DESTROY_GFN));
         }
-        put_gfn(d, gfn);
     }
 #endif
 
-    p2m_lock(p2m);
-
     p2m->phys_table = pagetable_null();
 
     while ( (pg = page_list_remove_head(&p2m->pages)) )
diff -r 390d6dc6c34b -r a0c55cc5d696 xen/common/grant_table.c
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -470,7 +470,7 @@ __gnttab_map_grant_ref(
     struct domain *ld, *rd, *owner;
     struct vcpu   *led;
     int            handle;
-    unsigned long gfn = INVALID_GFN;
+    unsigned long  gfn = INVALID_GFN;
     unsigned long  frame = 0, nr_gets = 0;
     struct page_info *pg;
     int            rc = GNTST_okay;
diff -r 390d6dc6c34b -r a0c55cc5d696 xen/common/memory.c
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -270,7 +270,7 @@ static long memory_exchange(XEN_GUEST_HA
     PAGE_LIST_HEAD(out_chunk_list);
     unsigned long in_chunk_order, out_chunk_order;
     xen_pfn_t     gpfn, gmfn, mfn;
-    unsigned long i, j, k;
+    unsigned long i, j, k = 0; /* gcc ... */
     unsigned int  memflags = 0;
     long          rc = 0;
     struct domain *d;
@@ -496,8 +496,12 @@ static long memory_exchange(XEN_GUEST_HA
  fail:
     /* Reassign any input pages we managed to steal. */
     while ( (page = page_list_remove_head(&in_chunk_list)) )
+    {
+        put_gfn(d, gmfn + k--);
         if ( assign_pages(d, page, 0, MEMF_no_refcount) )
             BUG();
+    }
+
  dying:
     rcu_unlock_domain(d);
     /* Free any output pages we managed to allocate. */
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-08  21:42 UTC
[Xen-devel] [PATCH 2 of 3] Make p2m lookups fully synchronized wrt modifications
xen/arch/x86/mm/mm-locks.h |  13 +++++++------
 xen/arch/x86/mm/p2m.c      |  18 +++++++++++++++++-
 xen/include/asm-x86/p2m.h  |  39 ++++++++++++++++++++++++---------------
 3 files changed, 48 insertions(+), 22 deletions(-)
We achieve this by locking/unlocking the global p2m_lock in get/put_gfn.
This brings about a few consequences for the p2m_lock:
 - not ordered anymore in mm-locks.h: unshare does get_gfn -> shr_lock,
   there are code paths that do paging_lock -> get_gfn. All of these
   would cause mm-locks.h to panic.
 - the lock is always taken recursively, as there are many paths that
   do get_gfn -> p2m_lock
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
diff -r a0c55cc5d696 -r 6203a0549d8a xen/arch/x86/mm/mm-locks.h
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -165,14 +165,15 @@ declare_mm_lock(nestedp2m)
 
 /* P2M lock (per-p2m-table)
  * 
- * This protects all updates to the p2m table.  Updates are expected to
- * be safe against concurrent reads, which do *not* require the lock. */
+ * This protects all queries and updates to the p2m table. Because queries
+ * can happen interleaved with other locks in here, we do not enforce
+ * ordering, and make all locking recursive. */
 
-declare_mm_lock(p2m)
-#define p2m_lock(p)           mm_lock(p2m, &(p)->lock)
-#define p2m_lock_recursive(p) mm_lock_recursive(p2m, &(p)->lock)
-#define p2m_unlock(p)         mm_unlock(&(p)->lock)
+#define p2m_lock(p)           spin_lock_recursive(&(p)->lock.lock)
+#define p2m_lock_recursive(p) spin_lock_recursive(&(p)->lock.lock)
+#define p2m_unlock(p)         spin_unlock_recursive(&(p)->lock.lock)
 #define p2m_locked_by_me(p)   mm_locked_by_me(&(p)->lock)
+#define gfn_locked_by_me(p,g) mm_locked_by_me(&(p)->lock)
 
 /* Page alloc lock (per-domain)
  *
diff -r a0c55cc5d696 -r 6203a0549d8a xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -158,6 +158,9 @@ mfn_t gfn_to_mfn_type_p2m(struct p2m_dom
         return _mfn(gfn);
     }
 
+    /* Grab the lock here, don''t release until put_gfn */
+    p2m_lock(p2m);
+
     mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order);
 
 #ifdef __x86_64__
@@ -182,6 +185,19 @@ mfn_t gfn_to_mfn_type_p2m(struct p2m_dom
     return mfn;
 }
 
+void put_gfn(struct domain *d, unsigned long gfn)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    if ( !p2m || !paging_mode_translate(d) )
+        /* Nothing to do in this case */
+        return;
+
+    ASSERT(p2m_locked_by_me(p2m));
+
+    p2m_unlock(p2m);
+}
+
 int set_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
                   unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma)
 {
@@ -190,7 +206,7 @@ int set_p2m_entry(struct p2m_domain *p2m
     unsigned int order;
     int rc = 1;
 
-    ASSERT(p2m_locked_by_me(p2m));
+    ASSERT(gfn_locked_by_me(p2m, gfn));
 
     while ( todo )
     {
diff -r a0c55cc5d696 -r 6203a0549d8a xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -305,9 +305,10 @@ struct p2m_domain *p2m_get_p2m(struct vc
 
 #define p2m_get_pagetable(p2m)  ((p2m)->phys_table)
 
-/**** p2m query accessors. After calling any of the variants below, you
- * need to call put_gfn(domain, gfn). If you don''t, you''ll
lock the
- * hypervisor. ****/
+/**** p2m query accessors. They lock p2m_lock, and thus serialize
+ * lookups wrt modifications. They _do not_ release the lock on exit.
+ * After calling any of the variants below, caller needs to use
+ * put_gfn. ****/
 
 /* Read a particular P2M table, mapping pages as we go.  Most callers
  * should _not_ call this directly; use the other get_gfn* functions
@@ -349,19 +350,27 @@ static inline unsigned long get_gfn_unty
     return INVALID_MFN;
 }
 
-/* This is a noop for now. */
-static inline void put_gfn(struct domain *d, unsigned long gfn)
-{
-}
+/* Will release the p2m_lock and put the page behind this mapping. */
+void put_gfn(struct domain *d, unsigned long gfn);
 
-/* These are identical for now. The intent is to have the caller not worry 
- * about put_gfn. To only be used in printk''s, crash situations, or to
- * peek at a type. You''re not holding the p2m entry exclsively after
calling
- * this. */
-#define get_gfn_unlocked(d, g, t)         get_gfn_type((d), (g), (t),
p2m_alloc)
-#define get_gfn_query_unlocked(d, g, t)   get_gfn_type((d), (g), (t),
p2m_query)
-#define get_gfn_guest_unlocked(d, g, t)   get_gfn_type((d), (g), (t),
p2m_guest)
-#define get_gfn_unshare_unlocked(d, g, t) get_gfn_type((d), (g), (t),
p2m_unshare)
+/* The intent is to have the caller not worry about put_gfn. They apply to 
+ * very specific situations: debug printk''s, dumps during a domain
crash,
+ * or to peek at a p2m entry/type. Caller is not holding the p2m entry 
+ * exclusively after calling this. */
+#define build_unlocked_accessor(name)                                   \
+    static inline mfn_t get_gfn ## name ## _unlocked(struct domain *d,  \
+                                                unsigned long gfn,      \
+                                                p2m_type_t *t)          \
+    {                                                                   \
+        mfn_t mfn = get_gfn ##name (d, gfn, t);                         \
+        put_gfn(d, gfn);                                                \
+        return mfn;                                                     \
+    }
+
+build_unlocked_accessor()
+build_unlocked_accessor(_query)
+build_unlocked_accessor(_guest)
+build_unlocked_accessor(_unshare)
 
 /* General conversion function from mfn to gfn */
 static inline unsigned long mfn_to_gfn(struct domain *d, mfn_t mfn)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-08  21:42 UTC
[Xen-devel] [PATCH 3 of 3] Make p2m critical sections hold a ref on the underlying mfn
xen/arch/x86/mm.c             |  18 +++++++++----
 xen/arch/x86/mm/mem_sharing.c |  13 +++------
 xen/arch/x86/mm/p2m.c         |  56 +++++++++++++++++++++++++++++++++++++++++-
 xen/common/grant_table.c      |   4 +-
 xen/common/memory.c           |  10 +++---
 xen/include/asm-x86/mm.h      |   3 +-
 xen/include/asm-x86/p2m.h     |  10 ++++++-
 xen/include/xen/paging.h      |   2 +-
 xen/include/xen/tmem_xen.h    |   2 +-
 9 files changed, 89 insertions(+), 29 deletions(-)
For translated domains, critical sections demarcated by a
get_gfn/put_gfn pair will hold an additional ref on the
underlying mfn.
This requires keeping tabs on special manipulations that
change said mfn:
 - physmap remove page
 - sharing
 - steal page
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
diff -r 6203a0549d8a -r 4b684fa74636 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4128,7 +4128,7 @@ static int replace_grant_p2m_mapping(
                  type, mfn_x(old_mfn), frame);
         return GNTST_general_error;
     }
-    guest_physmap_remove_page(d, gfn, frame, 0);
+    guest_physmap_remove_page(d, gfn, frame, 0, HOLDING_GFN);
 
     put_gfn(d, gfn);
     return GNTST_okay;
@@ -4248,8 +4248,10 @@ int donate_page(
 }
 
 int steal_page(
-    struct domain *d, struct page_info *page, unsigned int memflags)
+    struct domain *d, struct page_info *page, unsigned int memflags,
+    int holding_gfn)
 {
+    unsigned count;
     unsigned long x, y;
     bool_t drop_dom_ref = 0;
 
@@ -4261,11 +4263,14 @@ int steal_page(
     /*
      * We require there is just one reference (PGC_allocated). We temporarily
      * drop this reference now so that we can safely swizzle the owner.
+     * If, however, this is invoked by a caller holding the p2m entry, there
+     * will be another reference.
      */
+    count = (holding_gfn) ? 1 : 2;
     y = page->count_info;
     do {
         x = y;
-        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
+        if ( (x & (PGC_count_mask|PGC_allocated)) != (count |
PGC_allocated) )
             goto fail;
         y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
     } while ( y != x );
@@ -4276,7 +4281,7 @@ int steal_page(
     do {
         x = y;
         BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
-    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
+    } while ( (y = cmpxchg(&page->count_info, x, x | count)) != x );
 
     /* Unlink from original owner. */
     if ( !(memflags & MEMF_no_refcount) && !--d->tot_pages )
@@ -4779,7 +4784,8 @@ long arch_memory_op(int op, XEN_GUEST_HA
         {
             if ( is_xen_heap_mfn(prev_mfn) )
                 /* Xen heap frames are simply unhooked from this phys slot. */
-                guest_physmap_remove_page(d, xatp.gpfn, prev_mfn, 0);
+                guest_physmap_remove_page(d, xatp.gpfn, prev_mfn, 0, 
+                                            HOLDING_GFN);
             else
                 /* Normal domain memory is freed, to avoid leaking memory. */
                 guest_remove_page(d, xatp.gpfn);
@@ -4791,7 +4797,7 @@ long arch_memory_op(int op, XEN_GUEST_HA
         gpfn = get_gpfn_from_mfn(mfn);
         ASSERT( gpfn != SHARED_M2P_ENTRY );
         if ( gpfn != INVALID_M2P_ENTRY )
-            guest_physmap_remove_page(d, gpfn, mfn, 0);
+            guest_physmap_remove_page(d, gpfn, mfn, 0, (gpfn == gfn));
 
         /* Map at new location. */
         rc = guest_physmap_add_page(d, xatp.gpfn, mfn, 0);
diff -r 6203a0549d8a -r 4b684fa74636 xen/arch/x86/mm/mem_sharing.c
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -561,7 +561,7 @@ int mem_sharing_share_pages(shr_handle_t
         list_del(&gfn->list);
         d = get_domain_by_id(gfn->domain);
         BUG_ON(!d);
-        BUG_ON(set_shared_p2m_entry(d, gfn->gfn, se->mfn) == 0);
+        BUG_ON(set_shared_p2m_entry(d, gfn->gfn, se->mfn, 0) == 0);
         put_domain(d);
         list_add(&gfn->list, &se->gfns);
         put_page_and_type(cpage);
@@ -670,14 +670,9 @@ gfn_found:
     unmap_domain_page(s);
     unmap_domain_page(t);
 
-    /* NOTE: set_shared_p2m_entry will switch the underlying mfn. If
-     * we do get_page withing get_gfn, the correct sequence here
-     * should be
-       get_page(page);
-       put_page(old_page);
-     * so that the ref to the old page is dropped, and a ref to
-     * the new page is obtained to later be dropped in put_gfn */
-    BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page)) == 0);
+    /* NOTE: set_shared_p2m_entry will swap the underlying mfn and the refs. 
+     * It is safe to call put_gfn as usual after this. */
+    BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page), HOLDING_GFN) == 0);
     put_page_and_type(old_page);
 
 private_page_found:    
diff -r 6203a0549d8a -r 4b684fa74636 xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -182,6 +182,13 @@ mfn_t gfn_to_mfn_type_p2m(struct p2m_dom
     }
 #endif
 
+    /* Do a get page to get one additional ref on the mfn */
+    if ( mfn_valid(mfn) )
+    {
+        struct page_info *pg = mfn_to_page(mfn);
+        BUG_ON( !page_get_owner_and_reference(pg) );
+    }
+
     return mfn;
 }
 
@@ -195,6 +202,21 @@ void put_gfn(struct domain *d, unsigned 
 
     ASSERT(p2m_locked_by_me(p2m));
 
+    {
+        p2m_access_t a;
+        p2m_type_t t;
+        mfn_t mfn = p2m->get_entry(p2m, gfn, &t, &a, 
+                                    p2m_query, NULL);
+
+        if ( mfn_valid(mfn) )
+        {
+#ifdef __x86_64__
+            if (likely( !(p2m_is_broken(t)) ))
+#endif
+                put_page(mfn_to_page(mfn));
+        }
+    }    
+
     p2m_unlock(p2m);
 }
 
@@ -416,6 +438,28 @@ void p2m_final_teardown(struct domain *d
     p2m_teardown_nestedp2m(d);
 }
 
+/* If the caller has done get_gfn on this entry, then it has a ref on the
+ * old mfn. Swap the refs so put_gfn puts the appropriate ref */
+static inline void __p2m_swap_entry_refs(struct p2m_domain *p2m, 
+                                         unsigned long gfn, mfn_t mfn)
+{
+    p2m_type_t t;
+    p2m_access_t a;
+    mfn_t omfn;
+
+    if ( !paging_mode_translate(p2m->domain) )
+        return;
+
+    ASSERT(gfn_locked_by_me(p2m, gfn));
+
+    omfn = p2m->get_entry(p2m, gfn, &t, &a, p2m_query, NULL);
+
+    if ( mfn_valid(mfn) )
+        BUG_ON( !page_get_owner_and_reference(mfn_to_page(mfn)) );
+
+    if ( mfn_valid(omfn) )
+        put_page(mfn_to_page(omfn));
+}
 
 static void
 p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn,
@@ -451,11 +495,14 @@ p2m_remove_page(struct p2m_domain *p2m, 
 
 void
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
-                          unsigned long mfn, unsigned int page_order)
+                          unsigned long mfn, unsigned int page_order,
+                          int holding_gfn)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
     p2m_lock(p2m);
     audit_p2m(p2m, 1);
+    if (holding_gfn)
+        __p2m_swap_entry_refs(p2m, gfn, _mfn(INVALID_MFN));
     p2m_remove_page(p2m, gfn, mfn, page_order);
     audit_p2m(p2m, 1);
     p2m_unlock(p2m);
@@ -713,7 +760,8 @@ out:
 }
 
 int
-set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, 
+                        int holding_gfn)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
     int rc = 0;
@@ -730,6 +778,10 @@ set_shared_p2m_entry(struct domain *d, u
      * sharable first */
     ASSERT(p2m_is_shared(ot));
     ASSERT(mfn_valid(omfn));
+
+    if ( holding_gfn )
+        __p2m_swap_entry_refs(p2m, gfn, mfn);
+
     /* XXX: M2P translations have to be handled properly for shared pages */
     set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
 
diff -r 6203a0549d8a -r 4b684fa74636 xen/common/grant_table.c
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1497,7 +1497,7 @@ gnttab_transfer(
             goto copyback;
         }
 
-        if ( steal_page(d, page, 0) < 0 )
+        if ( steal_page(d, page, 0, HOLDING_GFN) < 0 )
         {
             put_gfn(d, gop.mfn);
             gop.status = GNTST_bad_page;
@@ -1505,7 +1505,7 @@ gnttab_transfer(
         }
 
 #ifndef __ia64__ /* IA64 implicitly replaces the old page in steal_page(). */
-        guest_physmap_remove_page(d, gop.mfn, mfn, 0);
+        guest_physmap_remove_page(d, gop.mfn, mfn, 0, HOLDING_GFN);
 #endif
         flush_tlb_mask(d->domain_dirty_cpumask);
 
diff -r 6203a0549d8a -r 4b684fa74636 xen/common/memory.c
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -165,7 +165,7 @@ int guest_remove_page(struct domain *d, 
     mfn = mfn_x(get_gfn(d, gmfn, &p2mt)); 
     if ( unlikely(p2m_is_paging(p2mt)) )
     {
-        guest_physmap_remove_page(d, gmfn, mfn, 0);
+        guest_physmap_remove_page(d, gmfn, mfn, 0, HOLDING_GFN);
         p2m_mem_paging_drop_page(d, gmfn);
         put_gfn(d, gmfn);
         return 1;
@@ -188,7 +188,7 @@ int guest_remove_page(struct domain *d, 
     if(p2m_is_shared(p2mt))
     {
         put_page_and_type(page);
-        guest_physmap_remove_page(d, gmfn, mfn, 0);
+        guest_physmap_remove_page(d, gmfn, mfn, 0, HOLDING_GFN);
         put_gfn(d, gmfn);
         return 1;
     }
@@ -207,7 +207,7 @@ int guest_remove_page(struct domain *d, 
     if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
         put_page(page);
 
-    guest_physmap_remove_page(d, gmfn, mfn, 0);
+    guest_physmap_remove_page(d, gmfn, mfn, 0, HOLDING_GFN);
     put_gfn(d, gmfn);
 
     put_page(page);
@@ -387,7 +387,7 @@ static long memory_exchange(XEN_GUEST_HA
 
                 page = mfn_to_page(mfn);
 
-                if ( unlikely(steal_page(d, page, MEMF_no_refcount)) )
+                if ( unlikely(steal_page(d, page, MEMF_no_refcount,
HOLDING_GFN)) )
                 {
                     put_gfn(d, gmfn + k);
                     rc = -EINVAL;
@@ -427,7 +427,7 @@ static long memory_exchange(XEN_GUEST_HA
             gfn = mfn_to_gmfn(d, mfn);
             /* Pages were unshared above */
             BUG_ON(SHARED_M2P(gfn));
-            guest_physmap_remove_page(d, gfn, mfn, 0);
+            guest_physmap_remove_page(d, gfn, mfn, 0, 0);
             put_page(page);
         }
 
diff -r 6203a0549d8a -r 4b684fa74636 xen/include/asm-x86/mm.h
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -578,7 +578,8 @@ int compat_arch_memory_op(int op, XEN_GU
 int compat_subarch_memory_op(int op, XEN_GUEST_HANDLE(void));
 
 int steal_page(
-    struct domain *d, struct page_info *page, unsigned int memflags);
+    struct domain *d, struct page_info *page, unsigned int memflags,
+    int holding_gfn);
 int donate_page(
     struct domain *d, struct page_info *page, unsigned int memflags);
 int page_make_sharable(struct domain *d, 
diff -r 6203a0549d8a -r 4b684fa74636 xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -353,6 +353,10 @@ static inline unsigned long get_gfn_unty
 /* Will release the p2m_lock and put the page behind this mapping. */
 void put_gfn(struct domain *d, unsigned long gfn);
 
+/* Operations that change the underlying mfn in a p2m entry need to be 
+ * told whether the caller is holding the current gfn */
+#define HOLDING_GFN 1
+
 /* The intent is to have the caller not worry about put_gfn. They apply to 
  * very specific situations: debug printk''s, dumps during a domain
crash,
  * or to peek at a p2m entry/type. Caller is not holding the p2m entry 
@@ -410,7 +414,8 @@ static inline int guest_physmap_add_page
 /* Remove a page from a domain''s p2m table */
 void guest_physmap_remove_page(struct domain *d,
                                unsigned long gfn,
-                               unsigned long mfn, unsigned int page_order);
+                               unsigned long mfn, unsigned int page_order,
+                               int holding_gfn);
 
 /* Set a p2m range as populate-on-demand */
 int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
@@ -471,7 +476,8 @@ p2m_pod_offline_or_broken_replace(struct
 
 #ifdef __x86_64__
 /* Modify p2m table for shared gfn */
-int set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
+int set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
+                            int holding_gfn);
 
 /* Check if a nominated gfn is valid to be paged out */
 int p2m_mem_paging_nominate(struct domain *d, unsigned long gfn);
diff -r 6203a0549d8a -r 4b684fa74636 xen/include/xen/paging.h
--- a/xen/include/xen/paging.h
+++ b/xen/include/xen/paging.h
@@ -21,7 +21,7 @@
 #define paging_mode_translate(d)              (0)
 #define paging_mode_external(d)               (0)
 #define guest_physmap_add_page(d, p, m, o)    (0)
-#define guest_physmap_remove_page(d, p, m, o) ((void)0)
+#define guest_physmap_remove_page(d, p, m, o, h)    ((void)0)
 
 #endif
 
diff -r 6203a0549d8a -r 4b684fa74636 xen/include/xen/tmem_xen.h
--- a/xen/include/xen/tmem_xen.h
+++ b/xen/include/xen/tmem_xen.h
@@ -198,7 +198,7 @@ static inline void _tmh_free_page_thispo
     struct domain *d = page_get_owner(pi);
 
     ASSERT(IS_VALID_PAGE(pi));
-    if ( (d == NULL) || steal_page(d,pi,0) == 0 )
+    if ( (d == NULL) || steal_page(d,pi,0,0) == 0 )
         tmh_page_list_put(pi);
     else
     {
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Tim Deegan
2011-Nov-10  12:48 UTC
Re: [Xen-devel] [PATCH 1 of 3] Improvements over API change for p2m lookups
At 16:42 -0500 on 08 Nov (1320770544), Andres Lagar-Cavilla wrote:> xen/arch/x86/hvm/hvm.c | 3 +-- > xen/arch/x86/mm/p2m.c | 8 ++++---- > xen/common/grant_table.c | 2 +- > xen/common/memory.c | 6 +++++- > 4 files changed, 11 insertions(+), 8 deletions(-) > > > Ranging from the cosmetic to actual bug fixing. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>These look fine. Please fold them into the next version of the API change patch. Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Nov-10  12:53 UTC
Re: [Xen-devel] [PATCH 2 of 3] Make p2m lookups fully synchronized wrt modifications
At 16:42 -0500 on 08 Nov (1320770545), Andres Lagar-Cavilla wrote:> xen/arch/x86/mm/mm-locks.h | 13 +++++++------ > xen/arch/x86/mm/p2m.c | 18 +++++++++++++++++- > xen/include/asm-x86/p2m.h | 39 ++++++++++++++++++++++++--------------- > 3 files changed, 48 insertions(+), 22 deletions(-) > > > We achieve this by locking/unlocking the global p2m_lock in get/put_gfn. > This brings about a few consequences for the p2m_lock: > > - not ordered anymore in mm-locks.h: unshare does get_gfn -> shr_lock, > there are code paths that do paging_lock -> get_gfn. All of these > would cause mm-locks.h to panic.In that case there''s a potential deadlock in the sharing code, and turning off the safety catches is not an acceptable response to that. :) ISTR you had a plan to get rid of the shr_lock entirely, or am I misremembering? Tim.> - the lock is always taken recursively, as there are many paths that > do get_gfn -> p2m_lock > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > diff -r a0c55cc5d696 -r 6203a0549d8a xen/arch/x86/mm/mm-locks.h > --- a/xen/arch/x86/mm/mm-locks.h > +++ b/xen/arch/x86/mm/mm-locks.h > @@ -165,14 +165,15 @@ declare_mm_lock(nestedp2m) > > /* P2M lock (per-p2m-table) > * > - * This protects all updates to the p2m table. Updates are expected to > - * be safe against concurrent reads, which do *not* require the lock. */ > + * This protects all queries and updates to the p2m table. Because queries > + * can happen interleaved with other locks in here, we do not enforce > + * ordering, and make all locking recursive. */ > > -declare_mm_lock(p2m) > -#define p2m_lock(p) mm_lock(p2m, &(p)->lock) > -#define p2m_lock_recursive(p) mm_lock_recursive(p2m, &(p)->lock) > -#define p2m_unlock(p) mm_unlock(&(p)->lock) > +#define p2m_lock(p) spin_lock_recursive(&(p)->lock.lock) > +#define p2m_lock_recursive(p) spin_lock_recursive(&(p)->lock.lock) > +#define p2m_unlock(p) spin_unlock_recursive(&(p)->lock.lock) > #define p2m_locked_by_me(p) mm_locked_by_me(&(p)->lock) > +#define gfn_locked_by_me(p,g) mm_locked_by_me(&(p)->lock) > > /* Page alloc lock (per-domain) > * > diff -r a0c55cc5d696 -r 6203a0549d8a xen/arch/x86/mm/p2m.c > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -158,6 +158,9 @@ mfn_t gfn_to_mfn_type_p2m(struct p2m_dom > return _mfn(gfn); > } > > + /* Grab the lock here, don''t release until put_gfn */ > + p2m_lock(p2m); > + > mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); > > #ifdef __x86_64__ > @@ -182,6 +185,19 @@ mfn_t gfn_to_mfn_type_p2m(struct p2m_dom > return mfn; > } > > +void put_gfn(struct domain *d, unsigned long gfn) > +{ > + struct p2m_domain *p2m = p2m_get_hostp2m(d); > + > + if ( !p2m || !paging_mode_translate(d) ) > + /* Nothing to do in this case */ > + return; > + > + ASSERT(p2m_locked_by_me(p2m)); > + > + p2m_unlock(p2m); > +} > + > int set_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, > unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma) > { > @@ -190,7 +206,7 @@ int set_p2m_entry(struct p2m_domain *p2m > unsigned int order; > int rc = 1; > > - ASSERT(p2m_locked_by_me(p2m)); > + ASSERT(gfn_locked_by_me(p2m, gfn)); > > while ( todo ) > { > diff -r a0c55cc5d696 -r 6203a0549d8a xen/include/asm-x86/p2m.h > --- a/xen/include/asm-x86/p2m.h > +++ b/xen/include/asm-x86/p2m.h > @@ -305,9 +305,10 @@ struct p2m_domain *p2m_get_p2m(struct vc > > #define p2m_get_pagetable(p2m) ((p2m)->phys_table) > > -/**** p2m query accessors. After calling any of the variants below, you > - * need to call put_gfn(domain, gfn). If you don''t, you''ll lock the > - * hypervisor. ****/ > +/**** p2m query accessors. They lock p2m_lock, and thus serialize > + * lookups wrt modifications. They _do not_ release the lock on exit. > + * After calling any of the variants below, caller needs to use > + * put_gfn. ****/ > > /* Read a particular P2M table, mapping pages as we go. Most callers > * should _not_ call this directly; use the other get_gfn* functions > @@ -349,19 +350,27 @@ static inline unsigned long get_gfn_unty > return INVALID_MFN; > } > > -/* This is a noop for now. */ > -static inline void put_gfn(struct domain *d, unsigned long gfn) > -{ > -} > +/* Will release the p2m_lock and put the page behind this mapping. */ > +void put_gfn(struct domain *d, unsigned long gfn); > > -/* These are identical for now. The intent is to have the caller not worry > - * about put_gfn. To only be used in printk''s, crash situations, or to > - * peek at a type. You''re not holding the p2m entry exclsively after calling > - * this. */ > -#define get_gfn_unlocked(d, g, t) get_gfn_type((d), (g), (t), p2m_alloc) > -#define get_gfn_query_unlocked(d, g, t) get_gfn_type((d), (g), (t), p2m_query) > -#define get_gfn_guest_unlocked(d, g, t) get_gfn_type((d), (g), (t), p2m_guest) > -#define get_gfn_unshare_unlocked(d, g, t) get_gfn_type((d), (g), (t), p2m_unshare) > +/* The intent is to have the caller not worry about put_gfn. They apply to > + * very specific situations: debug printk''s, dumps during a domain crash, > + * or to peek at a p2m entry/type. Caller is not holding the p2m entry > + * exclusively after calling this. */ > +#define build_unlocked_accessor(name) \ > + static inline mfn_t get_gfn ## name ## _unlocked(struct domain *d, \ > + unsigned long gfn, \ > + p2m_type_t *t) \ > + { \ > + mfn_t mfn = get_gfn ##name (d, gfn, t); \ > + put_gfn(d, gfn); \ > + return mfn; \ > + } > + > +build_unlocked_accessor() > +build_unlocked_accessor(_query) > +build_unlocked_accessor(_guest) > +build_unlocked_accessor(_unshare) > > /* General conversion function from mfn to gfn */ > static inline unsigned long mfn_to_gfn(struct domain *d, mfn_t mfn) > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Nov-10  13:17 UTC
Re: [Xen-devel] [PATCH 3 of 3] Make p2m critical sections hold a ref on the underlying mfn
Hi, At 16:42 -0500 on 08 Nov (1320770546), Andres Lagar-Cavilla wrote:> For translated domains, critical sections demarcated by a > get_gfn/put_gfn pair will hold an additional ref on the > underlying mfn.Remind me what this gets us again? Is it just belt-and-braces on top of the locking at the p2m layer? It should be possible to do this without the extra argument - for example, the bottom-level function that actually changes a p2m entry must hold the p2m exclusion lock for that entry, or the master p2m lock, right? In either case it knows whether it has a ref on the mfn. I suppose having these locks be recursive makes that a problem, but in that case you have another problem -- how many mfn refs need to be moved? Cheers, Tim (confused).> This requires keeping tabs on special manipulations that > change said mfn: > - physmap remove page > - sharing > - steal page > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > diff -r 6203a0549d8a -r 4b684fa74636 xen/arch/x86/mm.c > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -4128,7 +4128,7 @@ static int replace_grant_p2m_mapping( > type, mfn_x(old_mfn), frame); > return GNTST_general_error; > } > - guest_physmap_remove_page(d, gfn, frame, 0); > + guest_physmap_remove_page(d, gfn, frame, 0, HOLDING_GFN); > > put_gfn(d, gfn); > return GNTST_okay; > @@ -4248,8 +4248,10 @@ int donate_page( > } > > int steal_page( > - struct domain *d, struct page_info *page, unsigned int memflags) > + struct domain *d, struct page_info *page, unsigned int memflags, > + int holding_gfn) > { > + unsigned count; > unsigned long x, y; > bool_t drop_dom_ref = 0; > > @@ -4261,11 +4263,14 @@ int steal_page( > /* > * We require there is just one reference (PGC_allocated). We temporarily > * drop this reference now so that we can safely swizzle the owner. > + * If, however, this is invoked by a caller holding the p2m entry, there > + * will be another reference. > */ > + count = (holding_gfn) ? 1 : 2; > y = page->count_info; > do { > x = y; > - if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) ) > + if ( (x & (PGC_count_mask|PGC_allocated)) != (count | PGC_allocated) ) > goto fail; > y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask); > } while ( y != x ); > @@ -4276,7 +4281,7 @@ int steal_page( > do { > x = y; > BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated); > - } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x ); > + } while ( (y = cmpxchg(&page->count_info, x, x | count)) != x ); > > /* Unlink from original owner. */ > if ( !(memflags & MEMF_no_refcount) && !--d->tot_pages ) > @@ -4779,7 +4784,8 @@ long arch_memory_op(int op, XEN_GUEST_HA > { > if ( is_xen_heap_mfn(prev_mfn) ) > /* Xen heap frames are simply unhooked from this phys slot. */ > - guest_physmap_remove_page(d, xatp.gpfn, prev_mfn, 0); > + guest_physmap_remove_page(d, xatp.gpfn, prev_mfn, 0, > + HOLDING_GFN); > else > /* Normal domain memory is freed, to avoid leaking memory. */ > guest_remove_page(d, xatp.gpfn); > @@ -4791,7 +4797,7 @@ long arch_memory_op(int op, XEN_GUEST_HA > gpfn = get_gpfn_from_mfn(mfn); > ASSERT( gpfn != SHARED_M2P_ENTRY ); > if ( gpfn != INVALID_M2P_ENTRY ) > - guest_physmap_remove_page(d, gpfn, mfn, 0); > + guest_physmap_remove_page(d, gpfn, mfn, 0, (gpfn == gfn)); > > /* Map at new location. */ > rc = guest_physmap_add_page(d, xatp.gpfn, mfn, 0); > diff -r 6203a0549d8a -r 4b684fa74636 xen/arch/x86/mm/mem_sharing.c > --- a/xen/arch/x86/mm/mem_sharing.c > +++ b/xen/arch/x86/mm/mem_sharing.c > @@ -561,7 +561,7 @@ int mem_sharing_share_pages(shr_handle_t > list_del(&gfn->list); > d = get_domain_by_id(gfn->domain); > BUG_ON(!d); > - BUG_ON(set_shared_p2m_entry(d, gfn->gfn, se->mfn) == 0); > + BUG_ON(set_shared_p2m_entry(d, gfn->gfn, se->mfn, 0) == 0); > put_domain(d); > list_add(&gfn->list, &se->gfns); > put_page_and_type(cpage); > @@ -670,14 +670,9 @@ gfn_found: > unmap_domain_page(s); > unmap_domain_page(t); > > - /* NOTE: set_shared_p2m_entry will switch the underlying mfn. If > - * we do get_page withing get_gfn, the correct sequence here > - * should be > - get_page(page); > - put_page(old_page); > - * so that the ref to the old page is dropped, and a ref to > - * the new page is obtained to later be dropped in put_gfn */ > - BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page)) == 0); > + /* NOTE: set_shared_p2m_entry will swap the underlying mfn and the refs. > + * It is safe to call put_gfn as usual after this. */ > + BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page), HOLDING_GFN) == 0); > put_page_and_type(old_page); > > private_page_found: > diff -r 6203a0549d8a -r 4b684fa74636 xen/arch/x86/mm/p2m.c > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -182,6 +182,13 @@ mfn_t gfn_to_mfn_type_p2m(struct p2m_dom > } > #endif > > + /* Do a get page to get one additional ref on the mfn */ > + if ( mfn_valid(mfn) ) > + { > + struct page_info *pg = mfn_to_page(mfn); > + BUG_ON( !page_get_owner_and_reference(pg) ); > + } > + > return mfn; > } > > @@ -195,6 +202,21 @@ void put_gfn(struct domain *d, unsigned > > ASSERT(p2m_locked_by_me(p2m)); > > + { > + p2m_access_t a; > + p2m_type_t t; > + mfn_t mfn = p2m->get_entry(p2m, gfn, &t, &a, > + p2m_query, NULL); > + > + if ( mfn_valid(mfn) ) > + { > +#ifdef __x86_64__ > + if (likely( !(p2m_is_broken(t)) )) > +#endif > + put_page(mfn_to_page(mfn)); > + } > + } > + > p2m_unlock(p2m); > } > > @@ -416,6 +438,28 @@ void p2m_final_teardown(struct domain *d > p2m_teardown_nestedp2m(d); > } > > +/* If the caller has done get_gfn on this entry, then it has a ref on the > + * old mfn. Swap the refs so put_gfn puts the appropriate ref */ > +static inline void __p2m_swap_entry_refs(struct p2m_domain *p2m, > + unsigned long gfn, mfn_t mfn) > +{ > + p2m_type_t t; > + p2m_access_t a; > + mfn_t omfn; > + > + if ( !paging_mode_translate(p2m->domain) ) > + return; > + > + ASSERT(gfn_locked_by_me(p2m, gfn)); > + > + omfn = p2m->get_entry(p2m, gfn, &t, &a, p2m_query, NULL); > + > + if ( mfn_valid(mfn) ) > + BUG_ON( !page_get_owner_and_reference(mfn_to_page(mfn)) ); > + > + if ( mfn_valid(omfn) ) > + put_page(mfn_to_page(omfn)); > +} > > static void > p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn, > @@ -451,11 +495,14 @@ p2m_remove_page(struct p2m_domain *p2m, > > void > guest_physmap_remove_page(struct domain *d, unsigned long gfn, > - unsigned long mfn, unsigned int page_order) > + unsigned long mfn, unsigned int page_order, > + int holding_gfn) > { > struct p2m_domain *p2m = p2m_get_hostp2m(d); > p2m_lock(p2m); > audit_p2m(p2m, 1); > + if (holding_gfn) > + __p2m_swap_entry_refs(p2m, gfn, _mfn(INVALID_MFN)); > p2m_remove_page(p2m, gfn, mfn, page_order); > audit_p2m(p2m, 1); > p2m_unlock(p2m); > @@ -713,7 +760,8 @@ out: > } > > int > -set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) > +set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, > + int holding_gfn) > { > struct p2m_domain *p2m = p2m_get_hostp2m(d); > int rc = 0; > @@ -730,6 +778,10 @@ set_shared_p2m_entry(struct domain *d, u > * sharable first */ > ASSERT(p2m_is_shared(ot)); > ASSERT(mfn_valid(omfn)); > + > + if ( holding_gfn ) > + __p2m_swap_entry_refs(p2m, gfn, mfn); > + > /* XXX: M2P translations have to be handled properly for shared pages */ > set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY); > > diff -r 6203a0549d8a -r 4b684fa74636 xen/common/grant_table.c > --- a/xen/common/grant_table.c > +++ b/xen/common/grant_table.c > @@ -1497,7 +1497,7 @@ gnttab_transfer( > goto copyback; > } > > - if ( steal_page(d, page, 0) < 0 ) > + if ( steal_page(d, page, 0, HOLDING_GFN) < 0 ) > { > put_gfn(d, gop.mfn); > gop.status = GNTST_bad_page; > @@ -1505,7 +1505,7 @@ gnttab_transfer( > } > > #ifndef __ia64__ /* IA64 implicitly replaces the old page in steal_page(). */ > - guest_physmap_remove_page(d, gop.mfn, mfn, 0); > + guest_physmap_remove_page(d, gop.mfn, mfn, 0, HOLDING_GFN); > #endif > flush_tlb_mask(d->domain_dirty_cpumask); > > diff -r 6203a0549d8a -r 4b684fa74636 xen/common/memory.c > --- a/xen/common/memory.c > +++ b/xen/common/memory.c > @@ -165,7 +165,7 @@ int guest_remove_page(struct domain *d, > mfn = mfn_x(get_gfn(d, gmfn, &p2mt)); > if ( unlikely(p2m_is_paging(p2mt)) ) > { > - guest_physmap_remove_page(d, gmfn, mfn, 0); > + guest_physmap_remove_page(d, gmfn, mfn, 0, HOLDING_GFN); > p2m_mem_paging_drop_page(d, gmfn); > put_gfn(d, gmfn); > return 1; > @@ -188,7 +188,7 @@ int guest_remove_page(struct domain *d, > if(p2m_is_shared(p2mt)) > { > put_page_and_type(page); > - guest_physmap_remove_page(d, gmfn, mfn, 0); > + guest_physmap_remove_page(d, gmfn, mfn, 0, HOLDING_GFN); > put_gfn(d, gmfn); > return 1; > } > @@ -207,7 +207,7 @@ int guest_remove_page(struct domain *d, > if ( test_and_clear_bit(_PGC_allocated, &page->count_info) ) > put_page(page); > > - guest_physmap_remove_page(d, gmfn, mfn, 0); > + guest_physmap_remove_page(d, gmfn, mfn, 0, HOLDING_GFN); > put_gfn(d, gmfn); > > put_page(page); > @@ -387,7 +387,7 @@ static long memory_exchange(XEN_GUEST_HA > > page = mfn_to_page(mfn); > > - if ( unlikely(steal_page(d, page, MEMF_no_refcount)) ) > + if ( unlikely(steal_page(d, page, MEMF_no_refcount, HOLDING_GFN)) ) > { > put_gfn(d, gmfn + k); > rc = -EINVAL; > @@ -427,7 +427,7 @@ static long memory_exchange(XEN_GUEST_HA > gfn = mfn_to_gmfn(d, mfn); > /* Pages were unshared above */ > BUG_ON(SHARED_M2P(gfn)); > - guest_physmap_remove_page(d, gfn, mfn, 0); > + guest_physmap_remove_page(d, gfn, mfn, 0, 0); > put_page(page); > } > > diff -r 6203a0549d8a -r 4b684fa74636 xen/include/asm-x86/mm.h > --- a/xen/include/asm-x86/mm.h > +++ b/xen/include/asm-x86/mm.h > @@ -578,7 +578,8 @@ int compat_arch_memory_op(int op, XEN_GU > int compat_subarch_memory_op(int op, XEN_GUEST_HANDLE(void)); > > int steal_page( > - struct domain *d, struct page_info *page, unsigned int memflags); > + struct domain *d, struct page_info *page, unsigned int memflags, > + int holding_gfn); > int donate_page( > struct domain *d, struct page_info *page, unsigned int memflags); > int page_make_sharable(struct domain *d, > diff -r 6203a0549d8a -r 4b684fa74636 xen/include/asm-x86/p2m.h > --- a/xen/include/asm-x86/p2m.h > +++ b/xen/include/asm-x86/p2m.h > @@ -353,6 +353,10 @@ static inline unsigned long get_gfn_unty > /* Will release the p2m_lock and put the page behind this mapping. */ > void put_gfn(struct domain *d, unsigned long gfn); > > +/* Operations that change the underlying mfn in a p2m entry need to be > + * told whether the caller is holding the current gfn */ > +#define HOLDING_GFN 1 > + > /* The intent is to have the caller not worry about put_gfn. They apply to > * very specific situations: debug printk''s, dumps during a domain crash, > * or to peek at a p2m entry/type. Caller is not holding the p2m entry > @@ -410,7 +414,8 @@ static inline int guest_physmap_add_page > /* Remove a page from a domain''s p2m table */ > void guest_physmap_remove_page(struct domain *d, > unsigned long gfn, > - unsigned long mfn, unsigned int page_order); > + unsigned long mfn, unsigned int page_order, > + int holding_gfn); > > /* Set a p2m range as populate-on-demand */ > int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, > @@ -471,7 +476,8 @@ p2m_pod_offline_or_broken_replace(struct > > #ifdef __x86_64__ > /* Modify p2m table for shared gfn */ > -int set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn); > +int set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, > + int holding_gfn); > > /* Check if a nominated gfn is valid to be paged out */ > int p2m_mem_paging_nominate(struct domain *d, unsigned long gfn); > diff -r 6203a0549d8a -r 4b684fa74636 xen/include/xen/paging.h > --- a/xen/include/xen/paging.h > +++ b/xen/include/xen/paging.h > @@ -21,7 +21,7 @@ > #define paging_mode_translate(d) (0) > #define paging_mode_external(d) (0) > #define guest_physmap_add_page(d, p, m, o) (0) > -#define guest_physmap_remove_page(d, p, m, o) ((void)0) > +#define guest_physmap_remove_page(d, p, m, o, h) ((void)0) > > #endif > > diff -r 6203a0549d8a -r 4b684fa74636 xen/include/xen/tmem_xen.h > --- a/xen/include/xen/tmem_xen.h > +++ b/xen/include/xen/tmem_xen.h > @@ -198,7 +198,7 @@ static inline void _tmh_free_page_thispo > struct domain *d = page_get_owner(pi); > > ASSERT(IS_VALID_PAGE(pi)); > - if ( (d == NULL) || steal_page(d,pi,0) == 0 ) > + if ( (d == NULL) || steal_page(d,pi,0,0) == 0 ) > tmh_page_list_put(pi); > else > { > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-14  18:03 UTC
Re: [Xen-devel] [PATCH 2 of 3] Make p2m lookups fully synchronized wrt modifications
Hi there,> At 16:42 -0500 on 08 Nov (1320770545), Andres Lagar-Cavilla wrote: >> xen/arch/x86/mm/mm-locks.h | 13 +++++++------ >> xen/arch/x86/mm/p2m.c | 18 +++++++++++++++++- >> xen/include/asm-x86/p2m.h | 39 >> ++++++++++++++++++++++++--------------- >> 3 files changed, 48 insertions(+), 22 deletions(-) >> >> >> We achieve this by locking/unlocking the global p2m_lock in get/put_gfn. >> This brings about a few consequences for the p2m_lock: >> >> - not ordered anymore in mm-locks.h: unshare does get_gfn -> shr_lock, >> there are code paths that do paging_lock -> get_gfn. All of these >> would cause mm-locks.h to panic. > > In that case there''s a potential deadlock in the sharing code, and > turning off the safety catches is not an acceptable response to that. :) > > ISTR you had a plan to get rid of the shr_lock entirely, or am > I misremembering?Sharing is actually fine, I can reorder those safely until I get rid of the shr_lock. Except for sharing audits, which basically lock the whole hypervisor, and _no one is using at all_. I have a more fundamental problem with the paging lock. sh_update_cr3 can be called from a variety of situations. It will walk the four top level PAE mappings, acquiring the p2m entry for each, with the paging lock held. This is an inversion & deadlock, if I try to synchronize p2m lookups with the p2m lock. Any suggestions here? Other than disabling ordering of the p2m lock? Thanks Andres> > Tim. > >> - the lock is always taken recursively, as there are many paths that >> do get_gfn -> p2m_lock >> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >> >> diff -r a0c55cc5d696 -r 6203a0549d8a xen/arch/x86/mm/mm-locks.h >> --- a/xen/arch/x86/mm/mm-locks.h >> +++ b/xen/arch/x86/mm/mm-locks.h >> @@ -165,14 +165,15 @@ declare_mm_lock(nestedp2m) >> >> /* P2M lock (per-p2m-table) >> * >> - * This protects all updates to the p2m table. Updates are expected to >> - * be safe against concurrent reads, which do *not* require the lock. >> */ >> + * This protects all queries and updates to the p2m table. Because >> queries >> + * can happen interleaved with other locks in here, we do not enforce >> + * ordering, and make all locking recursive. */ >> >> -declare_mm_lock(p2m) >> -#define p2m_lock(p) mm_lock(p2m, &(p)->lock) >> -#define p2m_lock_recursive(p) mm_lock_recursive(p2m, &(p)->lock) >> -#define p2m_unlock(p) mm_unlock(&(p)->lock) >> +#define p2m_lock(p) spin_lock_recursive(&(p)->lock.lock) >> +#define p2m_lock_recursive(p) spin_lock_recursive(&(p)->lock.lock) >> +#define p2m_unlock(p) spin_unlock_recursive(&(p)->lock.lock) >> #define p2m_locked_by_me(p) mm_locked_by_me(&(p)->lock) >> +#define gfn_locked_by_me(p,g) mm_locked_by_me(&(p)->lock) >> >> /* Page alloc lock (per-domain) >> * >> diff -r a0c55cc5d696 -r 6203a0549d8a xen/arch/x86/mm/p2m.c >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -158,6 +158,9 @@ mfn_t gfn_to_mfn_type_p2m(struct p2m_dom >> return _mfn(gfn); >> } >> >> + /* Grab the lock here, don''t release until put_gfn */ >> + p2m_lock(p2m); >> + >> mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); >> >> #ifdef __x86_64__ >> @@ -182,6 +185,19 @@ mfn_t gfn_to_mfn_type_p2m(struct p2m_dom >> return mfn; >> } >> >> +void put_gfn(struct domain *d, unsigned long gfn) >> +{ >> + struct p2m_domain *p2m = p2m_get_hostp2m(d); >> + >> + if ( !p2m || !paging_mode_translate(d) ) >> + /* Nothing to do in this case */ >> + return; >> + >> + ASSERT(p2m_locked_by_me(p2m)); >> + >> + p2m_unlock(p2m); >> +} >> + >> int set_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, >> unsigned int page_order, p2m_type_t p2mt, >> p2m_access_t p2ma) >> { >> @@ -190,7 +206,7 @@ int set_p2m_entry(struct p2m_domain *p2m >> unsigned int order; >> int rc = 1; >> >> - ASSERT(p2m_locked_by_me(p2m)); >> + ASSERT(gfn_locked_by_me(p2m, gfn)); >> >> while ( todo ) >> { >> diff -r a0c55cc5d696 -r 6203a0549d8a xen/include/asm-x86/p2m.h >> --- a/xen/include/asm-x86/p2m.h >> +++ b/xen/include/asm-x86/p2m.h >> @@ -305,9 +305,10 @@ struct p2m_domain *p2m_get_p2m(struct vc >> >> #define p2m_get_pagetable(p2m) ((p2m)->phys_table) >> >> -/**** p2m query accessors. After calling any of the variants below, you >> - * need to call put_gfn(domain, gfn). If you don''t, you''ll lock the >> - * hypervisor. ****/ >> +/**** p2m query accessors. They lock p2m_lock, and thus serialize >> + * lookups wrt modifications. They _do not_ release the lock on exit. >> + * After calling any of the variants below, caller needs to use >> + * put_gfn. ****/ >> >> /* Read a particular P2M table, mapping pages as we go. Most callers >> * should _not_ call this directly; use the other get_gfn* functions >> @@ -349,19 +350,27 @@ static inline unsigned long get_gfn_unty >> return INVALID_MFN; >> } >> >> -/* This is a noop for now. */ >> -static inline void put_gfn(struct domain *d, unsigned long gfn) >> -{ >> -} >> +/* Will release the p2m_lock and put the page behind this mapping. */ >> +void put_gfn(struct domain *d, unsigned long gfn); >> >> -/* These are identical for now. The intent is to have the caller not >> worry >> - * about put_gfn. To only be used in printk''s, crash situations, or to >> - * peek at a type. You''re not holding the p2m entry exclsively after >> calling >> - * this. */ >> -#define get_gfn_unlocked(d, g, t) get_gfn_type((d), (g), (t), >> p2m_alloc) >> -#define get_gfn_query_unlocked(d, g, t) get_gfn_type((d), (g), (t), >> p2m_query) >> -#define get_gfn_guest_unlocked(d, g, t) get_gfn_type((d), (g), (t), >> p2m_guest) >> -#define get_gfn_unshare_unlocked(d, g, t) get_gfn_type((d), (g), (t), >> p2m_unshare) >> +/* The intent is to have the caller not worry about put_gfn. They apply >> to >> + * very specific situations: debug printk''s, dumps during a domain >> crash, >> + * or to peek at a p2m entry/type. Caller is not holding the p2m entry >> + * exclusively after calling this. */ >> +#define build_unlocked_accessor(name) >> \ >> + static inline mfn_t get_gfn ## name ## _unlocked(struct domain *d, >> \ >> + unsigned long gfn, >> \ >> + p2m_type_t *t) >> \ >> + { >> \ >> + mfn_t mfn = get_gfn ##name (d, gfn, t); >> \ >> + put_gfn(d, gfn); >> \ >> + return mfn; >> \ >> + } >> + >> +build_unlocked_accessor() >> +build_unlocked_accessor(_query) >> +build_unlocked_accessor(_guest) >> +build_unlocked_accessor(_unshare) >> >> /* General conversion function from mfn to gfn */ >> static inline unsigned long mfn_to_gfn(struct domain *d, mfn_t mfn) >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Nov-24  11:35 UTC
Re: [PATCH 2 of 3] Make p2m lookups fully synchronized wrt modifications
At 10:03 -0800 on 14 Nov (1321264988), Andres Lagar-Cavilla wrote:> Hi there, > > At 16:42 -0500 on 08 Nov (1320770545), Andres Lagar-Cavilla wrote: > >> xen/arch/x86/mm/mm-locks.h | 13 +++++++------ > >> xen/arch/x86/mm/p2m.c | 18 +++++++++++++++++- > >> xen/include/asm-x86/p2m.h | 39 > >> ++++++++++++++++++++++++--------------- > >> 3 files changed, 48 insertions(+), 22 deletions(-) > >> > >> > >> We achieve this by locking/unlocking the global p2m_lock in get/put_gfn. > >> This brings about a few consequences for the p2m_lock: > >> > >> - not ordered anymore in mm-locks.h: unshare does get_gfn -> shr_lock, > >> there are code paths that do paging_lock -> get_gfn. All of these > >> would cause mm-locks.h to panic. > > > > In that case there''s a potential deadlock in the sharing code, and > > turning off the safety catches is not an acceptable response to that. :) > > > > ISTR you had a plan to get rid of the shr_lock entirely, or am > > I misremembering? > Sharing is actually fine, I can reorder those safely until I get rid of > the shr_lock. Except for sharing audits, which basically lock the whole > hypervisor, and _no one is using at all_. > > I have a more fundamental problem with the paging lock. sh_update_cr3 can > be called from a variety of situations. It will walk the four top level > PAE mappings, acquiring the p2m entry for each, with the paging lock held. > This is an inversion & deadlock, if I try to synchronize p2m lookups with > the p2m lock.Is sh_update_cr3() really called with p2m locks/refs held? Since it doesn''t take a frame number as an argument it might be possible to shuffle things around at the callers.> Any suggestions here? Other than disabling ordering of the p2m lock?Disabling the ordering won''t do, so we need to find something else. Tim.
Andres Lagar-Cavilla
2011-Nov-24  16:41 UTC
Re: [PATCH 2 of 3] Make p2m lookups fully synchronized wrt modifications
> At 10:03 -0800 on 14 Nov (1321264988), Andres Lagar-Cavilla wrote: >> Hi there, >> > At 16:42 -0500 on 08 Nov (1320770545), Andres Lagar-Cavilla wrote: >> >> xen/arch/x86/mm/mm-locks.h | 13 +++++++------ >> >> xen/arch/x86/mm/p2m.c | 18 +++++++++++++++++- >> >> xen/include/asm-x86/p2m.h | 39 >> >> ++++++++++++++++++++++++--------------- >> >> 3 files changed, 48 insertions(+), 22 deletions(-) >> >> >> >> >> >> We achieve this by locking/unlocking the global p2m_lock in >> get/put_gfn. >> >> This brings about a few consequences for the p2m_lock: >> >> >> >> - not ordered anymore in mm-locks.h: unshare does get_gfn -> >> shr_lock, >> >> there are code paths that do paging_lock -> get_gfn. All of these >> >> would cause mm-locks.h to panic. >> > >> > In that case there''s a potential deadlock in the sharing code, and >> > turning off the safety catches is not an acceptable response to that. >> :) >> > >> > ISTR you had a plan to get rid of the shr_lock entirely, or am >> > I misremembering? >> Sharing is actually fine, I can reorder those safely until I get rid of >> the shr_lock. Except for sharing audits, which basically lock the whole >> hypervisor, and _no one is using at all_. >> >> I have a more fundamental problem with the paging lock. sh_update_cr3 >> can >> be called from a variety of situations. It will walk the four top level >> PAE mappings, acquiring the p2m entry for each, with the paging lock >> held. >> This is an inversion & deadlock, if I try to synchronize p2m lookups >> with >> the p2m lock. > > Is sh_update_cr3() really called with p2m locks/refs held? Since it > doesn''t take a frame number as an argument it might be possible to > shuffle things around at the callers.Ok, I''ve refined this a bit. There are several instances of code that takes the paging_lock and later needs to perform a p2m lookup. Apart from the previously mentioned example here are two others: hap_update_paging_modes -> ... -> vmx_load_pdptrs -> get_gfn(guest_cr3) sh_x86_emulate_* -> ... -> validate_gl?e -> get_gfn_query() None of these functions are called with p2m locks/refs held, and likely they shouldn''t. However, they will result in a deadlock panic situation, if get_gfn takes a lock. Here is one solution to consider: do not lock the p2m on lookups in shadow mode. Shadow mode does not support paging out and sharing of pages, which are primary reasons why we want synchronized p2m lookups. The hap cases where there is an inversion of the p2m_lock -> paging_lock order are reasonably simple to handle. The other option is to invert the order and place paging_lock -> p2m_lock, but that will raise another set of potential inversions. I think this is a no go. Finally, one could audit all these calls and have them perform futurology, get_gfn the gfn they will end up perhaps looking up, before taking the paging_lock. I also think this is a no go. All very unsavory. Andres> >> Any suggestions here? Other than disabling ordering of the p2m lock? > > Disabling the ordering won''t do, so we need to find something else. > > Tim. > >
Tim Deegan
2011-Dec-01  13:59 UTC
Re: [PATCH 2 of 3] Make p2m lookups fully synchronized wrt modifications
At 08:41 -0800 on 24 Nov (1322124101), Andres Lagar-Cavilla wrote:> Here is one solution to consider: do not lock the p2m on lookups in shadow > mode. Shadow mode does not support paging out and sharing of pages, which > are primary reasons why we want synchronized p2m lookups. The hap cases > where there is an inversion of the p2m_lock -> paging_lock order are > reasonably simple to handle. > > The other option is to invert the order and place paging_lock -> p2m_lock, > but that will raise another set of potential inversions. I think this is a > no go.Is there scope for having the p2m lock split out into the overall one (needed for allocations &c) and the per-record ones, and have them at different levels in the lock hierarchy? Tim.
Andres Lagar-Cavilla
2011-Dec-02  16:33 UTC
Re: [PATCH 2 of 3] Make p2m lookups fully synchronized wrt modifications
> At 08:41 -0800 on 24 Nov (1322124101), Andres Lagar-Cavilla wrote: >> Here is one solution to consider: do not lock the p2m on lookups in >> shadow >> mode. Shadow mode does not support paging out and sharing of pages, >> which >> are primary reasons why we want synchronized p2m lookups. The hap cases >> where there is an inversion of the p2m_lock -> paging_lock order are >> reasonably simple to handle. >> >> The other option is to invert the order and place paging_lock -> >> p2m_lock, >> but that will raise another set of potential inversions. I think this is >> a >> no go. > > Is there scope for having the p2m lock split out into the overall one > (needed for allocations &c) and the per-record ones, and have them at > different levels in the lock hierarchy?I think it''s saner to go with one big lock and then shard it if we run into performance problems. Your proposal, actually :) Now, I''m thinking of splitting ... the paging lock. It covers way too much ground right now, and many of these inversions would become more tractable. Thinking of paging_alloc_lock, paging_logdirty_lock, paging_lock. Thanks, Andres> > Tim. >
Tim Deegan
2011-Dec-02  17:38 UTC
Re: [PATCH 2 of 3] Make p2m lookups fully synchronized wrt modifications
At 08:33 -0800 on 02 Dec (1322814798), Andres Lagar-Cavilla wrote:> > At 08:41 -0800 on 24 Nov (1322124101), Andres Lagar-Cavilla wrote: > >> Here is one solution to consider: do not lock the p2m on lookups in > >> shadow > >> mode. Shadow mode does not support paging out and sharing of pages, > >> which > >> are primary reasons why we want synchronized p2m lookups. The hap cases > >> where there is an inversion of the p2m_lock -> paging_lock order are > >> reasonably simple to handle. > >> > >> The other option is to invert the order and place paging_lock -> > >> p2m_lock, > >> but that will raise another set of potential inversions. I think this is > >> a > >> no go. > > > > Is there scope for having the p2m lock split out into the overall one > > (needed for allocations &c) and the per-record ones, and have them at > > different levels in the lock hierarchy? > I think it''s saner to go with one big lock and then shard it if we run > into performance problems. Your proposal, actually :) > > Now, I''m thinking of splitting ... the paging lock. It covers way too much > ground right now, and many of these inversions would become more > tractable. Thinking of paging_alloc_lock, paging_logdirty_lock, > paging_lock....almost exactly six months after I merged the shadow, hap and log-dirty locks into one lock, to avoid the deadlocks that were caused by those sections of code calling in to each other. :) http://xenbits.xen.org/hg/staging/xen-unstable.hg/log?rev=2bbed Tim.