thr3ads.net - Xen devel - [PATCH 00 of 18] Memory sharing overhaul [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

[PATCH 00 of 18] Memory sharing overhaul

This patch series proposes an overhaul of the memory sharing code.

Aside from bug fixes and cleanups, the main features are:
- Polling of stats via libxc, libxl and console
- Removal of global sharing hashtable and global sharing lock 
 (if audit disabled)
- Turned sharing audits into a domctl
- New domctl to populate vacant physmap entries with shared 
 pages.

As a result, the domctl interface to sharing changes. The only in-tree
consumer of this interface is updated in the current series. It is 
important that if any out-of-tree consumer exists, that they state
their opinion on this interface change.

Patches 5 to 8, 10, 11, 15 and 18 are tools patches.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Signed-off-by: Adin Scannell <adin@scannell.ca>

 xen/arch/x86/mm.c                     |    6 +-
 xen/arch/x86/mm/mem_sharing.c         |   91 +++--
 xen/arch/x86/mm.c                     |    2 +-
 xen/arch/x86/mm/mem_sharing.c         |  526 +++++++++++++++++----------------
 xen/include/asm-x86/mem_sharing.h     |   15 +-
 xen/include/asm-x86/mm.h              |   11 +-
 xen/include/public/domctl.h           |    3 +
 xen/arch/x86/mm/mem_sharing.c         |   65 +++-
 xen/include/public/domctl.h           |    9 +
 tools/libxc/xc_memshr.c               |    3 +-
 tools/libxc/xenctrl.h                 |    1 +
 tools/libxc/xc_memshr.c               |   19 +-
 tools/libxc/xenctrl.h                 |    7 +-
 tools/blktap2/drivers/Makefile        |    2 +-
 tools/blktap2/drivers/tapdisk-image.c |    2 +-
 tools/blktap2/drivers/tapdisk-vbd.c   |    6 +-
 tools/blktap2/drivers/tapdisk.h       |    6 +-
 tools/memshr/bidir-daemon.c           |    4 +
 tools/memshr/bidir-hash.h             |   13 +-
 tools/memshr/interface.c              |   31 +-
 tools/memshr/memshr.h                 |   11 +-
 tools/libxl/xl.h                      |    1 +
 tools/libxl/xl_cmdimpl.c              |   85 +++++
 tools/libxl/xl_cmdtable.c             |    6 +
 xen/arch/x86/mm.c                     |    6 -
 xen/arch/x86/mm/mem_sharing.c         |   31 +-
 xen/arch/x86/x86_64/compat/mm.c       |    6 +
 xen/arch/x86/x86_64/mm.c              |    7 +
 xen/include/asm-x86/mem_sharing.h     |    1 +
 xen/include/public/memory.h           |    1 +
 tools/libxc/xc_private.c              |   10 +
 tools/libxc/xenctrl.h                 |    4 +
 tools/libxl/Makefile                  |    2 +-
 tools/libxl/xl_cmdimpl.c              |   13 +-
 xen/arch/x86/mm.c                     |    4 +-
 xen/include/asm-x86/mm.h              |    3 +
 xen/arch/x86/mm.c                     |   16 +-
 xen/arch/x86/mm/mem_sharing.c         |  169 +++++++++-
 xen/arch/x86/mm/mm-locks.h            |    6 +-
 xen/include/asm-x86/mm.h              |    2 +-
 xen/arch/x86/mm/mem_sharing.c         |  106 ++++++
 xen/include/public/domctl.h           |    3 +-
 tools/libxc/xc_memshr.c               |   23 +
 tools/libxc/xenctrl.h                 |    6 +
 xen/arch/ia64/xen/mm.c                |    6 +
 xen/arch/x86/mm/mem_sharing.c         |    8 +
 xen/common/keyhandler.c               |    7 +-
 xen/include/xen/mm.h                  |    3 +
 xen/arch/x86/mm/mem_sharing.c         |   17 +-
 xen/include/public/domctl.h           |    1 +
 tools/libxc/xc_memshr.c               |   14 +
 tools/libxc/xenctrl.h                 |    2 +
 52 files changed, 1005 insertions(+), 397 deletions(-)

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

head link

[PATCH 01 of 18] x86/mm: Code style fixes in mem_sharing.c

xen/arch/x86/mm.c             |   6 +-
 xen/arch/x86/mm/mem_sharing.c |  91 ++++++++++++++++++++++--------------------
 2 files changed, 50 insertions(+), 47 deletions(-)


Harmless patch, with no functional changes, only code style issues.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 62c342fd7b9c -r aeebbff899ff xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4321,9 +4321,9 @@ int page_make_sharable(struct domain *d,
         return -EEXIST;
     }
 
-    /* Check if the ref count is 2. The first from PGT_allocated, and
+    /* Check if the ref count is 2. The first from PGC_allocated, and
      * the second from get_page_and_type at the top of this function */
-    if(page->count_info != (PGC_allocated | (2 + expected_refcnt)))
+    if ( page->count_info != (PGC_allocated | (2 + expected_refcnt)) )
     {
         /* Return type count back to zero */
         put_page_and_type(page);
@@ -4340,7 +4340,7 @@ int page_make_sharable(struct domain *d,
 
 int page_make_private(struct domain *d, struct page_info *page)
 {
-    if(!get_page(page, dom_cow))
+    if ( !get_page(page, dom_cow) )
         return -EINVAL;
     
     spin_lock(&d->page_alloc_lock);
diff -r 62c342fd7b9c -r aeebbff899ff xen/arch/x86/mm/mem_sharing.c
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -35,14 +35,14 @@
 #include "mm-locks.h"
 
 /* Auditing of memory sharing code? */
-#define MEM_SHARING_AUDIT  0
+#define MEM_SHARING_AUDIT 0
 
 #if MEM_SHARING_AUDIT
 static void mem_sharing_audit(void);
 #define MEM_SHARING_DEBUG(_f, _a...)                                  \
     debugtrace_printk("mem_sharing_debug: %s(): " _f, __func__, ##_a)
 #else
-# define mem_sharing_audit() do {} while(0)
+#define mem_sharing_audit() do {} while(0)
 #endif /* MEM_SHARING_AUDIT */
 
 #define mem_sharing_enabled(d) \
@@ -56,7 +56,7 @@ static void mem_sharing_audit(void);
 #define page_to_mfn(_pg) _mfn(__page_to_mfn(_pg))
 
 static shr_handle_t next_handle = 1;
-static atomic_t nr_saved_mfns = ATOMIC_INIT(0); 
+static atomic_t nr_saved_mfns   = ATOMIC_INIT(0); 
 
 typedef struct shr_hash_entry 
 {
@@ -84,9 +84,9 @@ static inline int list_has_one_entry(str
     return (head->next != head) && (head->next->next == head);
 }
 
-static inline struct gfn_info* gfn_get_info(struct list_head *list)
+static inline gfn_info_t *gfn_get_info(struct list_head *list)
 {
-    return list_entry(list->next, struct gfn_info, list);
+    return list_entry(list->next, gfn_info_t, list);
 }
 
 static void __init mem_sharing_hash_init(void)
@@ -116,12 +116,12 @@ static gfn_info_t *mem_sharing_gfn_alloc
 static void mem_sharing_gfn_destroy(gfn_info_t *gfn, int was_shared)
 {
     /* Decrement the number of pages, if the gfn was shared before */
-    if(was_shared)
+    if ( was_shared )
     {
         struct domain *d = get_domain_by_id(gfn->domain);
-        /* Domain may have been destroyed by now, if we are called from
-         * p2m_teardown */
-        if(d)
+        /* Domain may have been destroyed by now *
+         * (if we are called from p2m_teardown)  */
+        if ( d )
         {
             atomic_dec(&d->shr_pages);
             put_domain(d);
@@ -261,7 +261,8 @@ static struct page_info* mem_sharing_all
     mem_event_request_t req;
 
     page = alloc_domheap_page(d, 0); 
-    if(page != NULL) return page;
+    if ( page != NULL )
+        return page;
 
     memset(&req, 0, sizeof(req));
     req.type = MEM_EVENT_TYPE_SHARED;
@@ -301,7 +302,7 @@ static struct page_info* mem_sharing_all
 
 unsigned int mem_sharing_get_nr_saved_mfns(void)
 {
-    return (unsigned int)atomic_read(&nr_saved_mfns);
+    return ((unsigned int)atomic_read(&nr_saved_mfns));
 }
 
 int mem_sharing_sharing_resume(struct domain *d)
@@ -328,14 +329,15 @@ int mem_sharing_debug_mfn(unsigned long 
 {
     struct page_info *page;
 
-    if(!mfn_valid(_mfn(mfn)))
+    if ( !mfn_valid(_mfn(mfn)) )
     {
-        printk("Invalid MFN=%lx\n", mfn);
+        gdprintk(XENLOG_ERR, "Invalid MFN=%lx\n", mfn);
         return -1;
     }
     page = mfn_to_page(_mfn(mfn));
 
-    printk("Debug page: MFN=%lx is ci=%lx, ti=%lx, owner_id=%d\n",
+    gdprintk(XENLOG_DEBUG, 
+            "Debug page: MFN=%lx is ci=%lx, ti=%lx, owner_id=%d\n",
             mfn_x(page_to_mfn(page)), 
             page->count_info, 
             page->u.inuse.type_info,
@@ -351,9 +353,9 @@ int mem_sharing_debug_gfn(struct domain 
 
     mfn = get_gfn_unlocked(d, gfn, &p2mt);
 
-    printk("Debug for domain=%d, gfn=%lx, ", 
-            d->domain_id, 
-            gfn);
+    gdprintk(XENLOG_DEBUG, "Debug for domain=%d, gfn=%lx, ", 
+               d->domain_id, 
+               gfn);
     return mem_sharing_debug_mfn(mfn_x(mfn));
 }
 
@@ -370,8 +372,8 @@ int mem_sharing_debug_gfn(struct domain 
 static grant_entry_header_t *
 shared_entry_header(struct grant_table *t, grant_ref_t ref)
 {
-    ASSERT(t->gt_version != 0);
-    if (t->gt_version == 1)
+    ASSERT (t->gt_version != 0);
+    if ( t->gt_version == 1 )
         return (grant_entry_header_t*)&shared_entry_v1(t, ref);
     else
         return &shared_entry_v2(t, ref).hdr;
@@ -381,25 +383,23 @@ static int mem_sharing_gref_to_gfn(struc
                                    grant_ref_t ref, 
                                    unsigned long *gfn)
 {
-    if(d->grant_table->gt_version < 1)
+    if ( d->grant_table->gt_version < 1 )
         return -1;
 
-    if (d->grant_table->gt_version == 1) 
+    if ( d->grant_table->gt_version == 1 ) 
     {
         grant_entry_v1_t *sha1;
         sha1 = &shared_entry_v1(d->grant_table, ref);
         *gfn = sha1->frame;
-        return 0;
     } 
     else 
     {
         grant_entry_v2_t *sha2;
         sha2 = &shared_entry_v2(d->grant_table, ref);
         *gfn = sha2->full_page.frame;
-        return 0;
     }
  
-    return -2;
+    return 0;
 }
 
 /* Account for a GFN being shared/unshared.
@@ -439,20 +439,22 @@ int mem_sharing_debug_gref(struct domain
     uint16_t status;
     unsigned long gfn;
 
-    if(d->grant_table->gt_version < 1)
+    if ( d->grant_table->gt_version < 1 )
     {
-        printk("Asked to debug [dom=%d,gref=%d], but not yet
inited.\n",
+        gdprintk(XENLOG_ERR, 
+                "Asked to debug [dom=%d,gref=%d], but not yet
inited.\n",
                 d->domain_id, ref);
         return -1;
     }
-    mem_sharing_gref_to_gfn(d, ref, &gfn); 
+    (void)mem_sharing_gref_to_gfn(d, ref, &gfn); 
     shah = shared_entry_header(d->grant_table, ref);
-    if (d->grant_table->gt_version == 1) 
+    if ( d->grant_table->gt_version == 1 ) 
         status = shah->flags;
     else 
         status = status_entry(d->grant_table, ref);
     
-    printk("==> Grant [dom=%d,ref=%d], status=%x. ", 
+    gdprintk(XENLOG_DEBUG,
+            "==> Grant [dom=%d,ref=%d], status=%x. ", 
             d->domain_id, ref, status);
 
     return mem_sharing_debug_gfn(d, gfn); 
@@ -478,24 +480,24 @@ int mem_sharing_nominate_page(struct dom
 
     /* Check if mfn is valid */
     ret = -EINVAL;
-    if (!mfn_valid(mfn))
+    if ( !mfn_valid(mfn) )
         goto out;
 
     /* Return the handle if the page is already shared */
     page = mfn_to_page(mfn);
-    if (p2m_is_shared(p2mt)) {
+    if ( p2m_is_shared(p2mt) ) {
         *phandle = page->shr_handle;
         ret = 0;
         goto out;
     }
 
     /* Check p2m type */
-    if (!p2m_is_sharable(p2mt))
+    if ( !p2m_is_sharable(p2mt) )
         goto out;
 
     /* Try to convert the mfn to the sharable type */
     ret = page_make_sharable(d, page, expected_refcnt); 
-    if(ret) 
+    if ( ret ) 
         goto out;
 
     /* Create the handle */
@@ -512,7 +514,7 @@ int mem_sharing_nominate_page(struct dom
     }
 
     /* Change the p2m type */
-    if(p2m_change_type(d, gfn, p2mt, p2m_ram_shared) != p2mt) 
+    if ( p2m_change_type(d, gfn, p2mt, p2m_ram_shared) != p2mt ) 
     {
         /* This is unlikely, as the type must have changed since we''ve
checked
          * it a few lines above.
@@ -618,7 +620,7 @@ int mem_sharing_unshare_page(struct doma
     mfn = get_gfn(d, gfn, &p2mt);
     
     /* Has someone already unshared it? */
-    if (!p2m_is_shared(p2mt)) {
+    if ( !p2m_is_shared(p2mt) ) {
         put_gfn(d, gfn);
         shr_unlock();
         return 0;
@@ -631,10 +633,11 @@ int mem_sharing_unshare_page(struct doma
     list_for_each(le, &hash_entry->gfns)
     {
         gfn_info = list_entry(le, struct gfn_info, list);
-        if((gfn_info->gfn == gfn) && (gfn_info->domain ==
d->domain_id))
+        if ( (gfn_info->gfn == gfn) && (gfn_info->domain ==
d->domain_id) )
             goto gfn_found;
     }
-    printk("Could not find gfn_info for shared gfn: %lx\n", gfn);
+    gdprintk(XENLOG_ERR, "Could not find gfn_info for shared gfn: "
+                            "%lx\n", gfn);
     BUG();
 gfn_found: 
     /* Delete gfn_info from the list, but hold on to it, until we''ve
allocated
@@ -644,7 +647,7 @@ gfn_found:
 
     /* If the GFN is getting destroyed drop the references to MFN 
      * (possibly freeing the page), and exit early */
-    if(flags & MEM_SHARING_DESTROY_GFN)
+    if ( flags & MEM_SHARING_DESTROY_GFN )
     {
         mem_sharing_gfn_destroy(gfn_info, !last_gfn);
         if(last_gfn) 
@@ -702,8 +705,8 @@ private_page_found:
     else
         atomic_dec(&nr_saved_mfns);
 
-    if(p2m_change_type(d, gfn, p2m_ram_shared, p2m_ram_rw) != 
-                                                p2m_ram_shared) 
+    if ( p2m_change_type(d, gfn, p2m_ram_shared, p2m_ram_rw) != 
+                                                p2m_ram_shared ) 
     {
         printk("Could not change p2m type.\n");
         BUG();
@@ -740,7 +743,7 @@ int mem_sharing_domctl(struct domain *d,
         {
             unsigned long gfn = mec->u.nominate.u.gfn;
             shr_handle_t handle;
-            if(!mem_sharing_enabled(d))
+            if ( !mem_sharing_enabled(d) )
                 return -EINVAL;
             rc = mem_sharing_nominate_page(d, gfn, 0, &handle);
             mec->u.nominate.handle = handle;
@@ -753,9 +756,9 @@ int mem_sharing_domctl(struct domain *d,
             unsigned long gfn;
             shr_handle_t handle;
 
-            if(!mem_sharing_enabled(d))
+            if ( !mem_sharing_enabled(d) )
                 return -EINVAL;
-            if(mem_sharing_gref_to_gfn(d, gref, &gfn) < 0)
+            if ( mem_sharing_gref_to_gfn(d, gref, &gfn) < 0 )
                 return -EINVAL;
             rc = mem_sharing_nominate_page(d, gfn, 3, &handle);
             mec->u.nominate.handle = handle;
@@ -772,7 +775,7 @@ int mem_sharing_domctl(struct domain *d,
 
         case XEN_DOMCTL_MEM_EVENT_OP_SHARING_RESUME:
         {
-            if(!mem_sharing_enabled(d))
+            if ( !mem_sharing_enabled(d) )
                 return -EINVAL;
             rc = mem_sharing_sharing_resume(d);
         }

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

head link

[PATCH 02 of 18] x86/mm: Making a page sharable sets PGT_validated, but making a page private doesn''t expect it

xen/arch/x86/mm.c |  2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r aeebbff899ff -r 61da3fc46f06 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4347,7 +4347,7 @@ int page_make_private(struct domain *d, 
 
     /* We can only change the type if count is one */
     if ( (page->u.inuse.type_info & (PGT_type_mask | PGT_count_mask))
-         != (PGT_shared_page | 1) )
+         != (PGT_shared_page | PGT_validated | 1) )
     {
         put_page(page);
         spin_unlock(&d->page_alloc_lock);

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

head link

[PATCH 03 of 18] x86/mm: Eliminate hash table in sharing code as index of shared mfns

xen/arch/x86/mm/mem_sharing.c     |  526 +++++++++++++++++++------------------
 xen/include/asm-x86/mem_sharing.h |   15 +-
 xen/include/asm-x86/mm.h          |   11 +-
 xen/include/public/domctl.h       |    3 +
 4 files changed, 296 insertions(+), 259 deletions(-)


The hashtable mechanism used by the sharing code is a bit odd.  It seems
unnecessary and adds complexity.  Instead, we replace this by storing a list
head directly in the page info for the case when the page is shared.  This does
not add any extra space to the page_info and serves to remove significant
complexity from sharing.

Signed-off-by: Adin Scannell <adin@scannell.ca>
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 61da3fc46f06 -r 3038770886aa xen/arch/x86/mm/mem_sharing.c
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -3,6 +3,7 @@
  *
  * Memory sharing support.
  *
+ * Copyright (c) 2011 GridCentric, Inc. (Adin Scannell & Andres
Lagar-Cavilla)
  * Copyright (c) 2009 Citrix Systems, Inc. (Grzegorz Milos)
  *
  * This program is free software; you can redistribute it and/or modify
@@ -38,11 +39,28 @@
 #define MEM_SHARING_AUDIT 0
 
 #if MEM_SHARING_AUDIT
-static void mem_sharing_audit(void);
+static int mem_sharing_audit(void);
 #define MEM_SHARING_DEBUG(_f, _a...)                                  \
     debugtrace_printk("mem_sharing_debug: %s(): " _f, __func__, ##_a)
+static struct list_head shr_audit_list;
+
+static inline void audit_add_list(struct page_info *page)
+{
+    list_add(&page->shared_info->entry, &shr_audit_list);
+}
+
+static inline void audit_del_list(struct page_info *page)
+{
+    list_del(&page->shared_info->entry);
+}
 #else
-#define mem_sharing_audit() do {} while(0)
+static inline int mem_sharing_audit(void)
+{
+    return 0;
+}
+
+#define audit_add_list(p)  ((void)0)
+#define audit_del_list(p)  ((void)0)
 #endif /* MEM_SHARING_AUDIT */
 
 #define mem_sharing_enabled(d) \
@@ -58,17 +76,6 @@ static void mem_sharing_audit(void);
 static shr_handle_t next_handle = 1;
 static atomic_t nr_saved_mfns   = ATOMIC_INIT(0); 
 
-typedef struct shr_hash_entry 
-{
-    shr_handle_t handle;
-    mfn_t mfn; 
-    struct shr_hash_entry *next;
-    struct list_head gfns;
-} shr_hash_entry_t;
-
-#define SHR_HASH_LENGTH 1000
-static shr_hash_entry_t *shr_hash[SHR_HASH_LENGTH];
-
 typedef struct gfn_info
 {
     unsigned long gfn;
@@ -89,36 +96,30 @@ static inline gfn_info_t *gfn_get_info(s
     return list_entry(list->next, gfn_info_t, list);
 }
 
-static void __init mem_sharing_hash_init(void)
+static inline gfn_info_t *mem_sharing_gfn_alloc(struct page_info *page,
+                                         struct domain *d,
+                                         unsigned long gfn)
 {
-    int i;
+    gfn_info_t *gfn_info = xmalloc(gfn_info_t);
 
-    mm_lock_init(&shr_lock);
-    for(i=0; i<SHR_HASH_LENGTH; i++)
-        shr_hash[i] = NULL;
+    if ( gfn_info == NULL )
+        return NULL; 
+
+    gfn_info->gfn = gfn;
+    gfn_info->domain = d->domain_id;
+    INIT_LIST_HEAD(&gfn_info->list);
+    list_add(&gfn_info->list, &page->shared_info->gfns);
+
+    return gfn_info;
 }
 
-static shr_hash_entry_t *mem_sharing_hash_alloc(void)
-{
-    return xmalloc(shr_hash_entry_t); 
-}
-
-static void mem_sharing_hash_destroy(shr_hash_entry_t *e)
-{
-    xfree(e);
-}
-
-static gfn_info_t *mem_sharing_gfn_alloc(void)
-{
-    return xmalloc(gfn_info_t); 
-}
-
-static void mem_sharing_gfn_destroy(gfn_info_t *gfn, int was_shared)
+static inline void mem_sharing_gfn_destroy(gfn_info_t *gfn_info,
+                                    int was_shared)
 {
     /* Decrement the number of pages, if the gfn was shared before */
     if ( was_shared )
     {
-        struct domain *d = get_domain_by_id(gfn->domain);
+        struct domain *d = get_domain_by_id(gfn_info->domain);
         /* Domain may have been destroyed by now *
          * (if we are called from p2m_teardown)  */
         if ( d )
@@ -127,128 +128,118 @@ static void mem_sharing_gfn_destroy(gfn_
             put_domain(d);
         }
     }
-    xfree(gfn);
+
+    list_del(&gfn_info->list);
+    xfree(gfn_info);
 }
 
-static shr_hash_entry_t* mem_sharing_hash_lookup(shr_handle_t handle)
+static struct page_info* mem_sharing_lookup(unsigned long mfn)
 {
-    shr_hash_entry_t *e;
-    
-    e = shr_hash[handle % SHR_HASH_LENGTH]; 
-    while(e != NULL)
+    if ( mfn_valid(_mfn(mfn)) )
     {
-        if(e->handle == handle)
-            return e;
-        e = e->next;
+        struct page_info* page = mfn_to_page(_mfn(mfn));
+        if ( page_get_owner(page) == dom_cow )
+        {
+            ASSERT(page->u.inuse.type_info & PGT_type_mask); 
+            ASSERT(get_gpfn_from_mfn(mfn) == SHARED_M2P_ENTRY); 
+            return page;
+        }
     }
 
     return NULL;
 }
 
-static shr_hash_entry_t* mem_sharing_hash_insert(shr_handle_t handle, mfn_t
mfn)
+#if MEM_SHARING_AUDIT
+static int mem_sharing_audit(void)
 {
-    shr_hash_entry_t *e, **ee;
-    
-    e = mem_sharing_hash_alloc();
-    if(e == NULL) return NULL;
-    e->handle = handle;
-    e->mfn = mfn;
-    ee = &shr_hash[handle % SHR_HASH_LENGTH]; 
-    e->next = *ee;
-    *ee = e;
-    return e;
-}
-
-static void mem_sharing_hash_delete(shr_handle_t handle)
-{
-    shr_hash_entry_t **pprev, *e;  
-
-    pprev = &shr_hash[handle % SHR_HASH_LENGTH];
-    e = *pprev;
-    while(e != NULL)
-    {
-        if(e->handle == handle)
-        {
-            *pprev = e->next;
-            mem_sharing_hash_destroy(e);
-            return;
-        }
-        pprev = &e->next;
-        e = e->next;
-    }
-    printk("Could not find shr entry for handle
%"PRIx64"\n", handle);
-    BUG();
-} 
-
-#if MEM_SHARING_AUDIT
-static void mem_sharing_audit(void)
-{
-    shr_hash_entry_t *e;
-    struct list_head *le;
-    gfn_info_t *g;
-    int bucket;
-    struct page_info *pg;
+    int errors = 0;
+    struct list_head *ae;
 
     ASSERT(shr_locked_by_me());
 
-    for(bucket=0; bucket < SHR_HASH_LENGTH; bucket++)
+    list_for_each(ae, &shr_audit_list)
     {
-        e = shr_hash[bucket];    
-        /* Loop over all shr_hash_entries */ 
-        while(e != NULL)
+        struct page_sharing_info *shared_info;
+        unsigned long nr_gfns = 0;
+        struct page_info *pg;
+        struct list_head *le;
+        mfn_t mfn;
+
+        shared_info = list_entry(ae, struct page_sharing_info, entry);
+        pg = shared_info->pg;
+        mfn = page_to_mfn(pg);
+
+        /* Check if the MFN has correct type, owner and handle. */ 
+        if ( !(pg->u.inuse.type_info & PGT_shared_page) )
         {
-            int nr_gfns=0;
+           MEM_SHARING_DEBUG("mfn %lx in audit list, but not
PGT_shared_page (%d)!\n",
+                              mfn_x(mfn), pg->u.inuse.type_info &
PGT_type_mask);
+           errors++;
+           continue;
+        }
 
-            /* Check if the MFN has correct type, owner and handle */ 
-            pg = mfn_to_page(e->mfn);
-            if((pg->u.inuse.type_info & PGT_type_mask) !=
PGT_shared_page)
-                MEM_SHARING_DEBUG("mfn %lx not shared, but in the
hash!\n",
-                                   mfn_x(e->mfn));
-            if(page_get_owner(pg) != dom_cow)
-                MEM_SHARING_DEBUG("mfn %lx shared, but wrong owner
(%d)!\n",
-                                   mfn_x(e->mfn), 
-                                   page_get_owner(pg)->domain_id);
-            if(e->handle != pg->shr_handle)
-                MEM_SHARING_DEBUG("mfn %lx shared, but wrong handle "
-                                  "(%ld != %ld)!\n",
-                                   mfn_x(e->mfn), pg->shr_handle,
e->handle);
-            /* Check if all GFNs map to the MFN, and the p2m types */
-            list_for_each(le, &e->gfns)
+        /* Check the page owner. */
+        if ( page_get_owner(pg) != dom_cow )
+        {
+           MEM_SHARING_DEBUG("mfn %lx shared, but wrong owner
(%d)!\n",
+                             mfn_x(mfn), page_get_owner(pg)->domain_id);
+           errors++;
+        }
+
+        /* Check the m2p entry */
+        if ( get_gpfn_from_mfn(mfn_x(mfn)) != SHARED_M2P_ENTRY )
+        {
+           MEM_SHARING_DEBUG("mfn %lx shared, but wrong m2p entry
(%lx)!\n",
+                             mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn)));
+           errors++;
+        }
+
+        /* Check if all GFNs map to the MFN, and the p2m types */
+        list_for_each(le, &pg->shared_info->gfns)
+        {
+            struct domain *d;
+            p2m_type_t t;
+            mfn_t o_mfn;
+            gfn_info_t *g;
+
+            g = list_entry(le, gfn_info_t, list);
+            d = get_domain_by_id(g->domain);
+            if ( d == NULL )
             {
-                struct domain *d;
-                p2m_type_t t;
-                mfn_t mfn;
-
-                g = list_entry(le, struct gfn_info, list);
-                d = get_domain_by_id(g->domain);
-                if(d == NULL)
-                {
-                    MEM_SHARING_DEBUG("Unknow dom: %d, for PFN=%lx,
MFN=%lx\n",
-                            g->domain, g->gfn, mfn_x(e->mfn));
-                    continue;
-                }
-                mfn = get_gfn_unlocked(d, g->gfn, &t); 
-                if(mfn_x(mfn) != mfn_x(e->mfn))
-                    MEM_SHARING_DEBUG("Incorrect P2M for d=%d,
PFN=%lx."
-                                      "Expecting MFN=%ld, got %ld\n",
-                                      g->domain, g->gfn,
mfn_x(e->mfn),
-                                      mfn_x(mfn));
-                if(t != p2m_ram_shared)
-                    MEM_SHARING_DEBUG("Incorrect P2M type for d=%d,
PFN=%lx."
-                                      "Expecting t=%d, got %d\n",
-                                      g->domain, g->gfn,
mfn_x(e->mfn),
-                                      p2m_ram_shared, t);
-                put_domain(d);
-                nr_gfns++;
-            } 
-            if(nr_gfns != (pg->u.inuse.type_info & PGT_count_mask))
-                MEM_SHARING_DEBUG("Mismatched counts for MFN=%lx."
-                                  "nr_gfns in hash %d, in type_info
%d\n",
-                                  mfn_x(e->mfn), nr_gfns, 
-                                 (pg->u.inuse.type_info &
PGT_count_mask));
-            e = e->next;
+                MEM_SHARING_DEBUG("Unknown dom: %d, for PFN=%lx,
MFN=%lx\n",
+                                  g->domain, g->gfn, mfn);
+                errors++;
+                continue;
+            }
+            o_mfn = get_gfn_query_unlocked(d, g->gfn, &t); 
+            if ( mfn_x(o_mfn) != mfn_x(mfn) )
+            {
+                MEM_SHARING_DEBUG("Incorrect P2M for d=%d, PFN=%lx."
+                                  "Expecting MFN=%ld, got %ld\n",
+                                  g->domain, g->gfn, mfn, mfn_x(o_mfn));
+                errors++;
+            }
+            if ( t != p2m_ram_shared )
+            {
+                MEM_SHARING_DEBUG("Incorrect P2M type for d=%d,
PFN=%lx."
+                                  "Expecting t=%d, got %d\n",
+                                  g->domain, g->gfn, mfn, p2m_ram_shared,
t);
+                errors++;
+            }
+            put_domain(d);
+            nr_gfns++;
+        }
+        if ( nr_gfns != (pg->u.inuse.type_info & PGT_count_mask) )
+        {
+            MEM_SHARING_DEBUG("Mismatched counts for MFN=%lx."
+                              "nr_gfns in hash %d, in type_info
%d\n",
+                              mfn_x(mfn), nr_gfns, 
+                              (pg->u.inuse.type_info & PGT_count_mask));
+            errors++;
         }
     }
+
+    return errors;
 }
 #endif
 
@@ -402,36 +393,6 @@ static int mem_sharing_gref_to_gfn(struc
     return 0;
 }
 
-/* Account for a GFN being shared/unshared.
- * When sharing this function needs to be called _before_ gfn lists are merged
- * together, but _after_ gfn is removed from the list when unsharing.
- */
-static int mem_sharing_gfn_account(struct gfn_info *gfn, int sharing)
-{
-    struct domain *d;
-
-    /* A) When sharing:
-     * if the gfn being shared is in > 1 long list, its already been 
-     * accounted for
-     * B) When unsharing:
-     * if the list is longer than > 1, we don''t have to account
yet.
-     */
-    if(list_has_one_entry(&gfn->list))
-    {
-        d = get_domain_by_id(gfn->domain);
-        BUG_ON(!d);
-        if(sharing) 
-            atomic_inc(&d->shr_pages);
-        else
-            atomic_dec(&d->shr_pages);
-        put_domain(d);
-
-        return 1;
-    }
-    mem_sharing_audit();
-
-    return 0;
-}
 
 int mem_sharing_debug_gref(struct domain *d, grant_ref_t ref)
 {
@@ -469,8 +430,6 @@ int mem_sharing_nominate_page(struct dom
     mfn_t mfn;
     struct page_info *page;
     int ret;
-    shr_handle_t handle;
-    shr_hash_entry_t *hash_entry;
     struct gfn_info *gfn_info;
 
     *phandle = 0UL;
@@ -486,7 +445,7 @@ int mem_sharing_nominate_page(struct dom
     /* Return the handle if the page is already shared */
     page = mfn_to_page(mfn);
     if ( p2m_is_shared(p2mt) ) {
-        *phandle = page->shr_handle;
+        *phandle = page->shared_info->handle;
         ret = 0;
         goto out;
     }
@@ -500,16 +459,27 @@ int mem_sharing_nominate_page(struct dom
     if ( ret ) 
         goto out;
 
-    /* Create the handle */
+    /* Initialize the shared state */
     ret = -ENOMEM;
-    handle = next_handle++;  
-    if((hash_entry = mem_sharing_hash_insert(handle, mfn)) == NULL)
+    if ( (page->shared_info = 
+            xmalloc(struct page_sharing_info)) == NULL )
     {
+        BUG_ON(page_make_private(d, page) != 0);
         goto out;
     }
-    if((gfn_info = mem_sharing_gfn_alloc()) == NULL)
+    page->shared_info->pg = page;
+    INIT_LIST_HEAD(&page->shared_info->gfns);
+    INIT_LIST_HEAD(&page->shared_info->entry);
+
+    /* Create the handle */
+    page->shared_info->handle = next_handle++;  
+
+    /* Create the local gfn info */
+    if ( (gfn_info = mem_sharing_gfn_alloc(page, d, gfn)) == NULL )
     {
-        mem_sharing_hash_destroy(hash_entry);
+        xfree(page->shared_info);
+        page->shared_info = NULL;
+        BUG_ON(page_make_private(d, page) != 0);
         goto out;
     }
 
@@ -520,23 +490,19 @@ int mem_sharing_nominate_page(struct dom
          * it a few lines above.
          * The mfn needs to revert back to rw type. This should never fail,
          * since no-one knew that the mfn was temporarily sharable */
+        mem_sharing_gfn_destroy(gfn_info, 0);
+        xfree(page->shared_info);
+        page->shared_info = NULL;
+        /* NOTE: We haven''t yet added this to the audit list. */
         BUG_ON(page_make_private(d, page) != 0);
-        mem_sharing_hash_destroy(hash_entry);
-        mem_sharing_gfn_destroy(gfn_info, 0);
         goto out;
     }
 
     /* Update m2p entry to SHARED_M2P_ENTRY */
     set_gpfn_from_mfn(mfn_x(mfn), SHARED_M2P_ENTRY);
 
-    INIT_LIST_HEAD(&hash_entry->gfns);
-    INIT_LIST_HEAD(&gfn_info->list);
-    list_add(&gfn_info->list, &hash_entry->gfns);
-    gfn_info->gfn = gfn;
-    gfn_info->domain = d->domain_id;
-    page->shr_handle = handle;
-    *phandle = handle;
-
+    *phandle = page->shared_info->handle;
+    audit_add_list(page);
     ret = 0;
 
 out:
@@ -545,54 +511,92 @@ out:
     return ret;
 }
 
-int mem_sharing_share_pages(shr_handle_t sh, shr_handle_t ch) 
+int mem_sharing_share_pages(struct domain *sd, unsigned long sgfn, shr_handle_t
sh,
+                            struct domain *cd, unsigned long cgfn, shr_handle_t
ch)
 {
-    shr_hash_entry_t *se, *ce;
     struct page_info *spage, *cpage;
     struct list_head *le, *te;
-    struct gfn_info *gfn;
+    gfn_info_t *gfn;
     struct domain *d;
-    int ret;
+    int ret = -EINVAL, single_client_gfn, single_source_gfn;
+    mfn_t smfn, cmfn;
+    p2m_type_t smfn_type, cmfn_type;
 
     shr_lock();
 
+    /* XXX if sd == cd handle potential deadlock by ordering
+     * the get_ and put_gfn''s */
+    smfn = get_gfn(sd, sgfn, &smfn_type);
+    cmfn = get_gfn(cd, cgfn, &cmfn_type);
+
     ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID;
-    se = mem_sharing_hash_lookup(sh);
-    if(se == NULL) goto err_out;
+    spage = mem_sharing_lookup(mfn_x(smfn));
+    if ( spage == NULL )
+        goto err_out;
+    ASSERT(smfn_type == p2m_ram_shared);
     ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID;
-    ce = mem_sharing_hash_lookup(ch);
-    if(ce == NULL) goto err_out;
-    spage = mfn_to_page(se->mfn); 
-    cpage = mfn_to_page(ce->mfn); 
-    /* gfn lists always have at least one entry => save to call list_entry
*/
-    mem_sharing_gfn_account(gfn_get_info(&ce->gfns), 1);
-    mem_sharing_gfn_account(gfn_get_info(&se->gfns), 1);
-    list_for_each_safe(le, te, &ce->gfns)
+    cpage = mem_sharing_lookup(mfn_x(cmfn));
+    if ( cpage == NULL )
+        goto err_out;
+    ASSERT(cmfn_type == p2m_ram_shared);
+
+    /* Check that the handles match */
+    if ( spage->shared_info->handle != sh )
     {
-        gfn = list_entry(le, struct gfn_info, list);
-        /* Get the source page and type, this should never fail 
-         * because we are under shr lock, and got non-null se */
+        ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID;
+        goto err_out;
+    }
+    if ( cpage->shared_info->handle != ch )
+    {
+        ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID;
+        goto err_out;
+    }
+
+    /* Merge the lists together */
+    single_source_gfn =
list_has_one_entry(&spage->shared_info->gfns);
+    single_client_gfn =
list_has_one_entry(&cpage->shared_info->gfns);
+    list_for_each_safe(le, te, &cpage->shared_info->gfns)
+    {
+        gfn = list_entry(le, gfn_info_t, list);
+        /* Get the source page and type, this should never fail: 
+         * we are under shr lock, and got a successful lookup */
         BUG_ON(!get_page_and_type(spage, dom_cow, PGT_shared_page));
-        /* Move the gfn_info from ce list to se list */
+        /* Move the gfn_info from client list to source list */
         list_del(&gfn->list);
+        list_add(&gfn->list, &spage->shared_info->gfns);
+        put_page_and_type(cpage);
         d = get_domain_by_id(gfn->domain);
         BUG_ON(!d);
-        BUG_ON(set_shared_p2m_entry(d, gfn->gfn, se->mfn) == 0);
+        BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn) == 0);
+        if ( single_client_gfn )
+        {
+            /* Only increase the per-domain count when we are actually
+             * sharing. And don''t increase it should we ever re-share
*/
+            atomic_inc(&d->shr_pages);
+            ASSERT( cd == d );
+        }
         put_domain(d);
-        list_add(&gfn->list, &se->gfns);
-        put_page_and_type(cpage);
-    } 
-    ASSERT(list_empty(&ce->gfns));
-    mem_sharing_hash_delete(ch);
-    atomic_inc(&nr_saved_mfns);
+    }
+    ASSERT(list_empty(&cpage->shared_info->gfns));
+    if ( single_source_gfn )
+        atomic_inc(&sd->shr_pages);
+
+    /* Clear the rest of the shared state */
+    audit_del_list(cpage);
+    xfree(cpage->shared_info);
+    cpage->shared_info = NULL;
+
     /* Free the client page */
     if(test_and_clear_bit(_PGC_allocated, &cpage->count_info))
         put_page(cpage);
+
+    atomic_inc(&nr_saved_mfns);
     ret = 0;
     
 err_out:
+    put_gfn(cd, cgfn);
+    put_gfn(sd, sgfn);
     shr_unlock();
-
     return ret;
 }
 
@@ -604,13 +608,9 @@ int mem_sharing_unshare_page(struct doma
     mfn_t mfn;
     struct page_info *page, *old_page;
     void *s, *t;
-    int ret, last_gfn;
-    shr_hash_entry_t *hash_entry;
-    struct gfn_info *gfn_info = NULL;
-    shr_handle_t handle;
+    int last_gfn;
+    gfn_info_t *gfn_info = NULL;
     struct list_head *le;
-
-    /* Remove the gfn_info from the list */
    
     /* This is one of the reasons why we can''t enforce ordering
      * between shr_lock and p2m fine-grained locks in mm-lock. 
@@ -626,56 +626,70 @@ int mem_sharing_unshare_page(struct doma
         return 0;
     }
 
-    page = mfn_to_page(mfn);
-    handle = page->shr_handle;
- 
-    hash_entry = mem_sharing_hash_lookup(handle); 
-    list_for_each(le, &hash_entry->gfns)
+    page = mem_sharing_lookup(mfn_x(mfn));
+    if ( page == NULL )
     {
-        gfn_info = list_entry(le, struct gfn_info, list);
+        gdprintk(XENLOG_ERR, "Domain p2m is shared, but page is not:
"
+                                "%lx\n", gfn);
+        BUG();
+    }
+
+    list_for_each(le, &page->shared_info->gfns)
+    {
+        gfn_info = list_entry(le, gfn_info_t, list);
         if ( (gfn_info->gfn == gfn) && (gfn_info->domain ==
d->domain_id) )
             goto gfn_found;
     }
     gdprintk(XENLOG_ERR, "Could not find gfn_info for shared gfn: "
                             "%lx\n", gfn);
     BUG();
+
 gfn_found: 
     /* Delete gfn_info from the list, but hold on to it, until we''ve
allocated
      * memory to make a copy */
-    list_del(&gfn_info->list);
-    last_gfn = list_empty(&hash_entry->gfns);
+    last_gfn = list_has_one_entry(&page->shared_info->gfns);
 
     /* If the GFN is getting destroyed drop the references to MFN 
      * (possibly freeing the page), and exit early */
     if ( flags & MEM_SHARING_DESTROY_GFN )
     {
         mem_sharing_gfn_destroy(gfn_info, !last_gfn);
-        if(last_gfn) 
-            mem_sharing_hash_delete(handle);
-        else 
+        if ( !last_gfn ) 
             /* Even though we don''t allocate a private page, we have
to account
              * for the MFN that originally backed this PFN. */
             atomic_dec(&nr_saved_mfns);
+
+        if ( last_gfn )
+        {
+            audit_del_list(page);
+            xfree(page->shared_info);
+            page->shared_info = NULL;
+        }
+
         put_gfn(d, gfn);
         shr_unlock();
         put_page_and_type(page);
-        if(last_gfn && 
-           test_and_clear_bit(_PGC_allocated, &page->count_info)) 
+        if( last_gfn && 
+            test_and_clear_bit(_PGC_allocated, &page->count_info)) 
             put_page(page);
+
         return 0;
     }
  
-    ret = page_make_private(d, page);
-    BUG_ON(last_gfn & ret);
-    if(ret == 0) goto private_page_found;
+    if ( last_gfn )
+    {
+        audit_del_list(page);
+        mem_sharing_gfn_destroy(gfn_info, !last_gfn);
+        xfree(page->shared_info);
+        page->shared_info = NULL;
+        BUG_ON(page_make_private(d, page) != 0);
+        goto private_page_found;
+    }
         
     old_page = page;
     page = mem_sharing_alloc_page(d, gfn);
-    if(!page) 
+    if ( !page ) 
     {
-        /* We''ve failed to obtain memory for private page. Need to
re-add the
-         * gfn_info to relevant list */
-        list_add(&gfn_info->list, &hash_entry->gfns);
         put_gfn(d, gfn);
         shr_unlock();
         return -ENOMEM;
@@ -687,30 +701,23 @@ gfn_found:
     unmap_domain_page(s);
     unmap_domain_page(t);
 
-    /* NOTE: set_shared_p2m_entry will switch the underlying mfn. If
-     * we do get_page withing get_gfn, the correct sequence here
-     * should be
-       get_page(page);
-       put_page(old_page);
-     * so that the ref to the old page is dropped, and a ref to
-     * the new page is obtained to later be dropped in put_gfn */
     BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page)) == 0);
+    /* We''ve got a private page, we can commit the gfn destruction */
+    mem_sharing_gfn_destroy(gfn_info, !last_gfn);
     put_page_and_type(old_page);
 
 private_page_found:    
-    /* We''ve got a private page, we can commit the gfn destruction */
-    mem_sharing_gfn_destroy(gfn_info, !last_gfn);
-    if(last_gfn) 
-        mem_sharing_hash_delete(handle);
-    else
+    if ( !last_gfn ) 
         atomic_dec(&nr_saved_mfns);
 
     if ( p2m_change_type(d, gfn, p2m_ram_shared, p2m_ram_rw) != 
                                                 p2m_ram_shared ) 
     {
-        printk("Could not change p2m type.\n");
+        gdprintk(XENLOG_ERR, "Could not change p2m type d %hu gfn
%lx.\n",
+                                d->domain_id, gfn);
         BUG();
     }
+
     /* Update m2p entry */
     set_gpfn_from_mfn(mfn_x(page_to_mfn(page)), gfn);
 
@@ -767,9 +774,18 @@ int mem_sharing_domctl(struct domain *d,
 
         case XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE:
         {
-            shr_handle_t sh = mec->u.share.source_handle;
-            shr_handle_t ch = mec->u.share.client_handle;
-            rc = mem_sharing_share_pages(sh, ch); 
+            unsigned long sgfn  = mec->u.share.source_gfn;
+            shr_handle_t sh     = mec->u.share.source_handle;
+            struct domain *cd   =
get_domain_by_id(mec->u.share.client_domain);
+            if ( cd )
+            {
+                unsigned long cgfn  = mec->u.share.client_gfn;
+                shr_handle_t ch     = mec->u.share.client_handle;
+                rc = mem_sharing_share_pages(d, sgfn, sh, cd, cgfn, ch); 
+                put_domain(cd);
+            }
+            else
+                return -EEXIST;
         }
         break;
 
@@ -817,6 +833,6 @@ int mem_sharing_domctl(struct domain *d,
 void __init mem_sharing_init(void)
 {
     printk("Initing memory sharing.\n");
-    mem_sharing_hash_init();
+    mm_lock_init(&shr_lock);
 }
 
diff -r 61da3fc46f06 -r 3038770886aa xen/include/asm-x86/mem_sharing.h
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -22,13 +22,23 @@
 #ifndef __MEM_SHARING_H__
 #define __MEM_SHARING_H__
 
+#include <public/domctl.h>
+
+typedef uint64_t shr_handle_t; 
+
+struct page_sharing_info
+{
+    struct page_info *pg;   /* Back pointer to the page. */
+    shr_handle_t handle;    /* Globally unique version / handle. */
+    struct list_head entry; /* List of all shared pages (entry). */
+    struct list_head gfns;  /* List of domains and gfns for this page (head).
*/
+};
+
 #ifdef __x86_64__
 
 #define sharing_supported(_d) \
     (is_hvm_domain(_d) && paging_mode_hap(_d)) 
 
-typedef uint64_t shr_handle_t; 
-
 unsigned int mem_sharing_get_nr_saved_mfns(void);
 int mem_sharing_nominate_page(struct domain *d, 
                               unsigned long gfn,
@@ -41,6 +51,7 @@ int mem_sharing_unshare_page(struct doma
 int mem_sharing_sharing_resume(struct domain *d);
 int mem_sharing_domctl(struct domain *d, 
                        xen_domctl_mem_sharing_op_t *mec);
+
 void mem_sharing_init(void);
 
 #else 
diff -r 61da3fc46f06 -r 3038770886aa xen/include/asm-x86/mm.h
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -31,6 +31,8 @@ struct page_list_entry
     __pdx_t next, prev;
 };
 
+struct page_sharing_info;
+
 struct page_info
 {
     union {
@@ -49,8 +51,13 @@ struct page_info
         /* For non-pinnable single-page shadows, a higher entry that points
          * at us. */
         paddr_t up;
-        /* For shared/sharable pages the sharing handle */
-        uint64_t shr_handle; 
+        /* For shared/sharable pages, we use a doubly-linked list
+         * of all the {pfn,domain} pairs that map this page. We also include
+         * an opaque handle, which is effectively a version, so that clients
+         * of sharing share the version they expect to.
+         * This list is allocated and freed when a page is shared/unshared.
+         */
+        struct page_sharing_info *shared_info;
     };
 
     /* Reference count and various PGC_xxx flags and fields. */
diff -r 61da3fc46f06 -r 3038770886aa xen/include/public/domctl.h
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -789,7 +789,10 @@ struct xen_domctl_mem_sharing_op {
             uint64_aligned_t  handle;     /* OUT: the handle           */
         } nominate;
         struct mem_sharing_op_share {     /* OP_SHARE */
+            uint64_aligned_t source_gfn;    /* IN: the gfn of the source page
*/
             uint64_aligned_t source_handle; /* IN: handle to the source page */
+            uint64_aligned_t client_domain; /* IN: the client domain id */
+            uint64_aligned_t client_gfn;    /* IN: the client gfn */
             uint64_aligned_t client_handle; /* IN: handle to the client page */
         } share; 
         struct mem_sharing_op_debug {     /* OP_DEBUG_xxx */

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

head link

[PATCH 04 of 18] x86/mm: Update mem sharing interface to (re)allow sharing of grants

xen/arch/x86/mm/mem_sharing.c |  65 +++++++++++++++++++++++++++++++++++++-----
 xen/include/public/domctl.h   |   9 +++++
 2 files changed, 65 insertions(+), 9 deletions(-)


Previosuly, the mem sharing code would return an opaque handle
to index shared pages (and nominees) in its global hash table.
By removing the hash table, the new interfaces requires a gfn
and a version. However, when sharing grants, the caller only
has a grant ref and a version. Update interface to handle this
case.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 3038770886aa -r 2e8d5702f4c1 xen/arch/x86/mm/mem_sharing.c
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -774,18 +774,65 @@ int mem_sharing_domctl(struct domain *d,
 
         case XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE:
         {
-            unsigned long sgfn  = mec->u.share.source_gfn;
-            shr_handle_t sh     = mec->u.share.source_handle;
-            struct domain *cd   =
get_domain_by_id(mec->u.share.client_domain);
-            if ( cd )
+            unsigned long sgfn, cgfn;
+            struct domain *cd;
+            shr_handle_t sh, ch;
+            int source_is_gref = 0;
+
+            if ( !mem_sharing_enabled(d) )
+                return -EINVAL;
+
+            cd = get_domain_by_id(mec->u.share.client_domain);
+            if ( !cd )
+                return -ESRCH;
+
+            if ( !mem_sharing_enabled(cd) )
             {
-                unsigned long cgfn  = mec->u.share.client_gfn;
-                shr_handle_t ch     = mec->u.share.client_handle;
-                rc = mem_sharing_share_pages(d, sgfn, sh, cd, cgfn, ch); 
                 put_domain(cd);
+                return -EINVAL;
             }
-            else
-                return -EEXIST;
+
+            if (
XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.source_gfn) )
+            {
+                grant_ref_t gref = (grant_ref_t) 
+                                    (XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF(
+                                        mec->u.share.source_gfn));
+                if ( mem_sharing_gref_to_gfn(d, gref, &sgfn) < 0 )
+                {
+                    put_domain(cd);
+                    return -EINVAL;
+                }
+                source_is_gref = 1;
+            } else {
+                sgfn  = mec->u.share.source_gfn;
+            }
+
+            if (
XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.client_gfn) )
+            {
+                grant_ref_t gref = (grant_ref_t) 
+                                    (XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF(
+                                        mec->u.share.client_gfn));
+                if ( (!source_is_gref) || 
+                     (mem_sharing_gref_to_gfn(cd, gref, &cgfn) < 0) )
+                {
+                    put_domain(cd);
+                    return -EINVAL;
+                }
+            } else {
+                if ( source_is_gref )
+                {
+                    put_domain(cd);
+                    return -EINVAL;
+                }
+                cgfn  = mec->u.share.client_gfn;
+            }
+
+            sh = mec->u.share.source_handle;
+            ch = mec->u.share.client_handle;
+
+            rc = mem_sharing_share_pages(d, sgfn, sh, cd, cgfn, ch); 
+
+            put_domain(cd);
         }
         break;
 
diff -r 3038770886aa -r 2e8d5702f4c1 xen/include/public/domctl.h
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -775,6 +775,15 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_mem_e
 #define XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID  (-10)
 #define XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID  (-9)
 
+#define XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG   (1ULL << 62)
+
+#define XEN_DOMCTL_MEM_SHARING_FIELD_MAKE_GREF(field, val)  \
+    (field) = (XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG | val)
+#define XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(field)         \
+    ((field) & XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG)
+#define XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF(field)        \
+    ((field) & (~XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG))
+
 struct xen_domctl_mem_sharing_op {
     uint8_t op; /* XEN_DOMCTL_MEM_EVENT_OP_* */

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

head link

[PATCH 05 of 18] Tools: Do not assume sharing target is dom0 in libxc wrappers

tools/libxc/xc_memshr.c |  3 ++-
 tools/libxc/xenctrl.h   |  1 +
 2 files changed, 3 insertions(+), 1 deletions(-)


The libxc wrapper that shares two pages always assumed one
of the pages was mapped by dom0. Expand the API to specify
both sharing domains.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 2e8d5702f4c1 -r b398fc97ab19 tools/libxc/xc_memshr.c
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -87,6 +87,7 @@ int xc_memshr_nominate_gref(xc_interface
 }
 
 int xc_memshr_share(xc_interface *xch,
+                    uint32_t source_domain,
                     uint64_t source_handle,
                     uint64_t client_handle)
 {
@@ -95,7 +96,7 @@ int xc_memshr_share(xc_interface *xch,
 
     domctl.cmd = XEN_DOMCTL_mem_sharing_op;
     domctl.interface_version = XEN_DOMCTL_INTERFACE_VERSION;
-    domctl.domain = 0;
+    domctl.domain = (domid_t) source_domain;
     op = &(domctl.u.mem_sharing_op);
     op->op = XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE;
     op->u.share.source_handle = source_handle;
diff -r 2e8d5702f4c1 -r b398fc97ab19 tools/libxc/xenctrl.h
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1892,6 +1892,7 @@ int xc_memshr_nominate_gref(xc_interface
                             grant_ref_t gref,
                             uint64_t *handle);
 int xc_memshr_share(xc_interface *xch,
+                    uint32_t source_domain,
                     uint64_t source_handle,
                     uint64_t client_handle);
 int xc_memshr_domain_resume(xc_interface *xch,

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

head link

[PATCH 06 of 18] Tools: Update libxc mem sharing interface

tools/libxc/xc_memshr.c |  19 ++++++++++++++++++-
 tools/libxc/xenctrl.h   |   7 ++++++-
 2 files changed, 24 insertions(+), 2 deletions(-)


Previosuly, the mem sharing code would return an opaque handle
to index shared pages (and nominees) in its global hash table.
By removing the hash table, the handle becomes a version, to
avoid sharing a stale version of a page. Thus, libxc wrappers
need to be updated accordingly.

Signed-off-by: Adin Scannell <adin@scannell.ca>
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r b398fc97ab19 -r 6ad4a8da105e tools/libxc/xc_memshr.c
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -88,8 +88,13 @@ int xc_memshr_nominate_gref(xc_interface
 
 int xc_memshr_share(xc_interface *xch,
                     uint32_t source_domain,
+                    uint64_t source_gfn,
                     uint64_t source_handle,
-                    uint64_t client_handle)
+                    int source_is_gref,
+                    uint32_t client_domain,
+                    uint64_t client_gfn,
+                    uint64_t client_handle,
+                    int client_is_gref)
 {
     DECLARE_DOMCTL;
     struct xen_domctl_mem_sharing_op *op;
@@ -100,8 +105,20 @@ int xc_memshr_share(xc_interface *xch,
     op = &(domctl.u.mem_sharing_op);
     op->op = XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE;
     op->u.share.source_handle = source_handle;
+    op->u.share.client_domain = (uint64_t) client_domain;
     op->u.share.client_handle = client_handle;
 
+    if (source_is_gref)
+        XEN_DOMCTL_MEM_SHARING_FIELD_MAKE_GREF(
+                op->u.share.source_gfn, source_gfn);
+    else
+        op->u.share.source_gfn = source_gfn;
+    if (client_is_gref)
+        XEN_DOMCTL_MEM_SHARING_FIELD_MAKE_GREF(
+                op->u.share.client_gfn, client_gfn);
+    else
+        op->u.share.client_gfn = client_gfn;
+
     return do_domctl(xch, &domctl);
 }
 
diff -r b398fc97ab19 -r 6ad4a8da105e tools/libxc/xenctrl.h
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1893,8 +1893,13 @@ int xc_memshr_nominate_gref(xc_interface
                             uint64_t *handle);
 int xc_memshr_share(xc_interface *xch,
                     uint32_t source_domain,
+                    uint64_t source_gfn,
                     uint64_t source_handle,
-                    uint64_t client_handle);
+                    int source_is_gref,
+                    uint32_t client_domain,
+                    uint64_t client_gfn,
+                    uint64_t client_handle,
+                    int dest_is_gref);
 int xc_memshr_domain_resume(xc_interface *xch,
                             uint32_t domid);
 int xc_memshr_debug_gfn(xc_interface *xch,

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

head link

[PATCH 07 of 18] Tools: Update memshr tool to use new sharing API

tools/blktap2/drivers/Makefile        |   2 +-
 tools/blktap2/drivers/tapdisk-image.c |   2 +-
 tools/blktap2/drivers/tapdisk-vbd.c   |   6 +++---
 tools/blktap2/drivers/tapdisk.h       |   6 +++++-
 tools/memshr/bidir-daemon.c           |   4 ++++
 tools/memshr/bidir-hash.h             |  13 ++++++++-----
 tools/memshr/interface.c              |  31 +++++++++++++++++++------------
 tools/memshr/memshr.h                 |  11 +++++++++--
 8 files changed, 50 insertions(+), 25 deletions(-)


The only (in-tree, that we know of) consumer of the mem sharing API
is the memshr tool (conditionally linked into blktap2). Update it to
use the new API.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/blktap2/drivers/Makefile
--- a/tools/blktap2/drivers/Makefile
+++ b/tools/blktap2/drivers/Makefile
@@ -43,7 +43,7 @@ MEMSHR_DIR = $(XEN_ROOT)/tools/memshr
 MEMSHRLIBS : ifeq ($(CONFIG_Linux), __fixme__)
 CFLAGS += -DMEMSHR
-MEMSHRLIBS += $(MEMSHR_DIR)/libmemshr.a
+MEMSHRLIBS += -L$(XEN_ROOT)/tools/libxc -lxenctrl $(MEMSHR_DIR)/libmemshr.a
 endif
 
 ifeq ($(VHD_STATIC),y)
diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/blktap2/drivers/tapdisk-image.c
--- a/tools/blktap2/drivers/tapdisk-image.c
+++ b/tools/blktap2/drivers/tapdisk-image.c
@@ -60,7 +60,7 @@ tapdisk_image_allocate(const char *file,
 	image->storage   = storage;
 	image->private   = private;
 #ifdef MEMSHR
-	image->memshr_id = memshr_vbd_image_get(file);
+	image->memshr_id = memshr_vbd_image_get((char *)file);
 #endif
 	INIT_LIST_HEAD(&image->next);
 
diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/blktap2/drivers/tapdisk-vbd.c
--- a/tools/blktap2/drivers/tapdisk-vbd.c
+++ b/tools/blktap2/drivers/tapdisk-vbd.c
@@ -1218,14 +1218,14 @@ __tapdisk_vbd_complete_td_request(td_vbd
 #ifdef MEMSHR
 		if (treq.op == TD_OP_READ
 		   && td_flag_test(image->flags, TD_OPEN_RDONLY)) {
-			uint64_t hnd  = treq.memshr_hnd;
+			share_tuple_t hnd = treq.memshr_hnd;
 			uint16_t uid  = image->memshr_id;
 			blkif_request_t *breq = &vreq->req;
 			uint64_t sec  = tapdisk_vbd_breq_get_sector(breq, treq);
 			int secs = breq->seg[treq.sidx].last_sect -
 			    breq->seg[treq.sidx].first_sect + 1;
 
-			if (hnd != 0)
+			if (hnd.handle != 0)
 				memshr_vbd_complete_ro_request(hnd, uid,
 								sec, secs);
 		}
@@ -1297,7 +1297,7 @@ __tapdisk_vbd_reissue_td_request(td_vbd_
 				/* Reset memshr handle. This''ll prevent
 				 * memshr_vbd_complete_ro_request being called
 				 */
-				treq.memshr_hnd = 0;
+				treq.memshr_hnd.handle = 0;
 				td_complete_request(treq, 0);
 			} else
 				td_queue_read(parent, treq);
diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/blktap2/drivers/tapdisk.h
--- a/tools/blktap2/drivers/tapdisk.h
+++ b/tools/blktap2/drivers/tapdisk.h
@@ -64,6 +64,10 @@
 #include "tapdisk-log.h"
 #include "tapdisk-utils.h"
 
+#ifdef MEMSHR
+#include "memshr.h"
+#endif
+
 #define DPRINTF(_f, _a...)           syslog(LOG_INFO, _f, ##_a)
 #define EPRINTF(_f, _a...)           syslog(LOG_ERR, "tap-err:%s: "
_f, __func__, ##_a)
 #define PERROR(_f, _a...)            EPRINTF(_f ": %s", ##_a,
strerror(errno))
@@ -136,7 +140,7 @@ struct td_request {
 	void                        *private;
     
 #ifdef MEMSHR
-	uint64_t                     memshr_hnd;
+	share_tuple_t                memshr_hnd;
 #endif
 };
 
diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/memshr/bidir-daemon.c
--- a/tools/memshr/bidir-daemon.c
+++ b/tools/memshr/bidir-daemon.c
@@ -48,7 +48,11 @@ void* bidir_daemon(void *unused)
             to_remove = 0.1 * max_nr_ent; 
             while(to_remove > 0) 
             {
+#if 0
                 ret = blockshr_shrhnd_remove(blks_hash, next_remove, NULL);
+#else
+                ret = -1;
+#endif
                 if(ret < 0)
                 {
                     /* We failed to remove an entry, because of a serious hash
diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/memshr/bidir-hash.h
--- a/tools/memshr/bidir-hash.h
+++ b/tools/memshr/bidir-hash.h
@@ -81,15 +81,16 @@ static int fgprtshr_mfn_cmp(uint32_t m1,
 #undef BIDIR_VALUE
 #undef BIDIR_KEY_T
 #undef BIDIR_VALUE_T
+
 /* TODO better hashes! */
 static inline uint32_t blockshr_block_hash(vbdblk_t block)
 {
     return (uint32_t)(block.sec) ^ (uint32_t)(block.disk_id);
 }
 
-static inline uint32_t blockshr_shrhnd_hash(uint64_t shrhnd)
+static inline uint32_t blockshr_shrhnd_hash(share_tuple_t shrhnd)
 {
-    return (uint32_t)shrhnd;
+    return ((uint32_t) shrhnd.handle);
 }
 
 static inline int blockshr_block_cmp(vbdblk_t b1, vbdblk_t b2)
@@ -97,15 +98,17 @@ static inline int blockshr_block_cmp(vbd
     return (b1.sec == b2.sec) && (b1.disk_id == b2.disk_id);
 }
 
-static inline int blockshr_shrhnd_cmp(uint64_t h1, uint64_t h2)
+static inline int blockshr_shrhnd_cmp(share_tuple_t h1, share_tuple_t h2)
 {
-    return (h1 == h2);
+    return ( (h1.domain == h2.domain) &&
+             (h1.frame  == h2.frame) &&
+             (h1.handle == h2.handle) );
 }
 #define BIDIR_NAME_PREFIX       blockshr
 #define BIDIR_KEY               block
 #define BIDIR_VALUE             shrhnd
 #define BIDIR_KEY_T             vbdblk_t
-#define BIDIR_VALUE_T           uint64_t
+#define BIDIR_VALUE_T           share_tuple_t
 #include "bidir-namedefs.h"
 
 #endif /* BLOCK_MAP */
diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/memshr/interface.c
--- a/tools/memshr/interface.c
+++ b/tools/memshr/interface.c
@@ -145,16 +145,17 @@ void memshr_vbd_image_put(uint16_t memsh
     
 int memshr_vbd_issue_ro_request(char *buf,
                                 grant_ref_t gref,
-                                uint16_t file_id, 
+                                uint16_t file_id,
                                 uint64_t sec, 
                                 int secs,
-                                uint64_t *hnd)
+                                share_tuple_t *hnd)
 {
     vbdblk_t blk;
-    uint64_t s_hnd, c_hnd;
+    share_tuple_t source_st, client_st;
+    uint64_t c_hnd;
     int ret;
 
-    *hnd = 0;
+    *hnd = (share_tuple_t){ 0, 0, 0 };
     if(!vbd_info.enabled) 
         return -1;
 
@@ -169,26 +170,31 @@ int memshr_vbd_issue_ro_request(char *bu
     /* If page couldn''t be made sharable, we cannot do anything about
it */
     if(ret != 0)
         return -3;
-    *hnd = c_hnd;
+
+    *(&client_st) = (share_tuple_t){ vbd_info.domid, gref, c_hnd };
+    *hnd = client_st;
 
     /* Check if we''ve read matching disk block previously */
     blk.sec     = sec;
     blk.disk_id = file_id;
-    if(blockshr_block_lookup(memshr.blks, blk, &s_hnd) > 0)
+    if(blockshr_block_lookup(memshr.blks, blk, &source_st) > 0)
     {
-        ret = xc_memshr_share(vbd_info.xc_handle, s_hnd, c_hnd);
+        ret = xc_memshr_share(vbd_info.xc_handle, source_st.domain,
source_st.frame, 1,
+                                source_st.handle, vbd_info.domid, gref, c_hnd,
1);
         if(!ret) return 0;
         /* Handles failed to be shared => at least one of them must be
invalid,
            remove the relevant ones from the map */
         switch(ret)
         {
             case XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID:
-                ret = blockshr_shrhnd_remove(memshr.blks, s_hnd, NULL);
-                if(ret) DPRINTF("Could not rm invl s_hnd:
%"PRId64"\n", s_hnd);
+                ret = blockshr_shrhnd_remove(memshr.blks, source_st, NULL);
+                if(ret) DPRINTF("Could not rm invl s_hnd: %u
%"PRId64" %"PRId64"\n",
+                                    source_st.domain, source_st.frame,
source_st.handle);
                 break;
             case XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID:
-                ret = blockshr_shrhnd_remove(memshr.blks, c_hnd, NULL);
-                if(ret) DPRINTF("Could not rm invl c_hnd:
%"PRId64"\n", c_hnd);
+                ret = blockshr_shrhnd_remove(memshr.blks, client_st, NULL);
+                if(ret) DPRINTF("Could not rm invl c_hnd: %u
%"PRId64" %"PRId64"\n",
+                                    client_st.domain, client_st.frame,
client_st.handle);
                 break;
             default:
                 break;
@@ -199,12 +205,13 @@ int memshr_vbd_issue_ro_request(char *bu
     return -4;
 }
 
-void memshr_vbd_complete_ro_request(uint64_t hnd,
+void memshr_vbd_complete_ro_request(share_tuple_t hnd,
                                     uint16_t file_id, 
                                     uint64_t sec, 
                                     int secs)
 {
     vbdblk_t blk;
+    share_tuple_t shr_tuple;
     
     if(!vbd_info.enabled) 
         return;
diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/memshr/memshr.h
--- a/tools/memshr/memshr.h
+++ b/tools/memshr/memshr.h
@@ -25,6 +25,13 @@
 
 typedef uint64_t xen_mfn_t;
 
+typedef struct share_tuple 
+{
+    uint32_t domain;
+    uint64_t frame;
+    uint64_t handle;
+} share_tuple_t;
+
 extern void memshr_set_domid(int domid);
 extern void memshr_daemon_initialize(void);
 extern void memshr_vbd_initialize(void);
@@ -35,9 +42,9 @@ extern int memshr_vbd_issue_ro_request(c
                                        uint16_t file_id, 
                                        uint64_t sec, 
                                        int secs,
-                                       uint64_t *hnd);
+                                       share_tuple_t *hnd);
 extern void memshr_vbd_complete_ro_request(
-                                       uint64_t hnd,
+                                       share_tuple_t hnd,
                                        uint16_t file_id, 
                                        uint64_t sec, 
                                        int secs);

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

head link

[PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages

tools/libxl/xl.h          |   1 +
 tools/libxl/xl_cmdimpl.c  |  85 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/xl_cmdtable.c |   6 +++
 3 files changed, 92 insertions(+), 0 deletions(-)


Signed-off-by: Adin Scannell <adin@scannell.ca>

diff -r 8d2a8094ace5 -r 24d514cd4dee tools/libxl/xl.h
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -28,6 +28,7 @@ struct cmd_spec {
 
 int main_vcpulist(int argc, char **argv);
 int main_info(int argc, char **argv);
+int main_sharing(int argc, char **argv);
 int main_cd_eject(int argc, char **argv);
 int main_cd_insert(int argc, char **argv);
 int main_console(int argc, char **argv);
diff -r 8d2a8094ace5 -r 24d514cd4dee tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3755,6 +3755,91 @@ int main_info(int argc, char **argv)
     return 0;
 }
 
+static void sharing(int totals, const libxl_dominfo *info, int nb_domain)
+{
+    int i;
+
+    printf("Name                                        ID   Mem
Shared\n");
+
+    for (i = 0; i < nb_domain; i++) {
+        char *domname;
+        unsigned shutdown_reason;
+        domname = libxl_domid_to_name(ctx, info[i].domid);
+        shutdown_reason = info[i].shutdown ? info[i].shutdown_reason : 0;
+        printf("%-40s %5d %5lu  %5lu\n",
+                domname,
+                info[i].domid,
+                (unsigned long) (info[i].current_memkb / 1024),
+                (unsigned long) (info[i].shared_memkb / 1024));
+        free(domname);
+    }
+
+    if (totals)
+    {
+        /* To be added with a future patch. */
+    }
+}
+
+int main_sharing(int argc, char **argv)
+{
+    int opt;
+    int option_index = 0;
+    static struct option long_options[] = {
+        {"help", 0, 0, ''h''},
+        {"totals", 0, 0, ''t''},
+        {0, 0, 0, 0}
+    };
+    int totals = 0;
+
+    libxl_dominfo info_buf;
+    libxl_dominfo *info, *info_free=0;
+    int nb_domain, rc;
+
+    while ((opt = getopt_long(argc, argv, "ht", long_options,
&option_index)) != -1) {
+        switch (opt) {
+        case ''h'':
+            help("sharing");
+            return 0;
+        case ''t'':
+            totals = 1;
+            break;
+        default:
+            fprintf(stderr, "option `%c'' not supported.\n",
optopt);
+            break;
+        }
+    }
+
+    if (optind >= argc) {
+        info = libxl_list_domain(ctx, &nb_domain);
+        if (!info) {
+            fprintf(stderr, "libxl_domain_infolist failed.\n");
+            return 1;
+        }
+        info_free = info;
+    } else if (optind == argc-1) {
+        find_domain(argv[optind]);
+        rc = libxl_domain_info(ctx, &info_buf, domid);
+        if (rc == ERROR_INVAL) {
+            fprintf(stderr, "Error: Domain \''%s\'' does
not exist.\n",
+                argv[optind]);
+            return -rc;
+        }
+        if (rc) {
+            fprintf(stderr, "libxl_domain_info failed (code %d).\n",
rc);
+            return -rc;
+        }
+        info = &info_buf;
+        nb_domain = 1;
+    } else {
+        help("sharing");
+        return 2;
+    }
+
+    sharing(totals, info, nb_domain);
+
+    return 0;
+}
+
 static int sched_credit_domain_get(
     int domid, libxl_sched_credit *scinfo)
 {
diff -r 8d2a8094ace5 -r 24d514cd4dee tools/libxl/xl_cmdtable.c
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -189,6 +189,12 @@ struct cmd_spec cmd_table[] = {
       "Get information about Xen host",
       "-n, --numa         List host NUMA topology information",
     },
+    { "sharing",
+      &main_sharing, 0,
+      "Get information about page sharing",
+      "[options] [Domain]", 
+      "-t, --totals              Include host totals in the output",
+    },
     { "sched-credit",
       &main_sched_credit, 0,
       "Get/set credit scheduler parameters",

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

head link

[PATCH 09 of 18] x86/mm: Check how many mfns are shared, in addition to how many are saved

xen/arch/x86/mm.c                 |   6 ------
 xen/arch/x86/mm/mem_sharing.c     |  31 ++++++++++++++++++++++++++++---
 xen/arch/x86/x86_64/compat/mm.c   |   6 ++++++
 xen/arch/x86/x86_64/mm.c          |   7 +++++++
 xen/include/asm-x86/mem_sharing.h |   1 +
 xen/include/public/memory.h       |   1 +
 6 files changed, 43 insertions(+), 9 deletions(-)


This patch also moves the existing sharing-related memory op to the
correct location, and adds logic to the audit() method that uses the
new information.

This patch only provides the Xen implementation of the domctls.

Signed-off-by: Adin Scannell <adin@scannell.ca>

diff -r 24d514cd4dee -r 65b32b391373 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -119,7 +119,6 @@
 #include <xen/trace.h>
 #include <asm/setup.h>
 #include <asm/fixmap.h>
-#include <asm/mem_sharing.h>
 
 /*
  * Mapping of first 2 or 4 megabytes of memory. This is mapped with 4kB
@@ -5093,11 +5092,6 @@ long arch_memory_op(int op, XEN_GUEST_HA
         return rc;
     }
 
-#ifdef __x86_64__
-    case XENMEM_get_sharing_freed_pages:
-        return mem_sharing_get_nr_saved_mfns();
-#endif
-
     default:
         return subarch_memory_op(op, arg);
     }
diff -r 24d514cd4dee -r 65b32b391373 xen/arch/x86/mm/mem_sharing.c
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -75,6 +75,7 @@ static inline int mem_sharing_audit(void
 
 static shr_handle_t next_handle = 1;
 static atomic_t nr_saved_mfns   = ATOMIC_INIT(0); 
+static atomic_t nr_shared_mfns  = ATOMIC_INIT(0);
 
 typedef struct gfn_info
 {
@@ -153,9 +154,12 @@ static struct page_info* mem_sharing_loo
 static int mem_sharing_audit(void)
 {
     int errors = 0;
+    unsigned long count_expected;
+    unsigned long count_found = 0;
     struct list_head *ae;
 
     ASSERT(shr_locked_by_me());
+    count_expected = atomic_read(&nr_shared_mfns);
 
     list_for_each(ae, &shr_audit_list)
     {
@@ -194,6 +198,7 @@ static int mem_sharing_audit(void)
            errors++;
         }
 
+        count_found++;
         /* Check if all GFNs map to the MFN, and the p2m types */
         list_for_each(le, &pg->shared_info->gfns)
         {
@@ -239,6 +244,13 @@ static int mem_sharing_audit(void)
         }
     }
 
+    if ( count_found != count_expected )
+    {
+        MEM_SHARING_DEBUG("Expected %ld shared mfns, found %ld.",
+                          count_expected, count_found);
+        errors++;
+    }
+
     return errors;
 }
 #endif
@@ -296,6 +308,11 @@ unsigned int mem_sharing_get_nr_saved_mf
     return ((unsigned int)atomic_read(&nr_saved_mfns));
 }
 
+unsigned int mem_sharing_get_nr_shared_mfns(void)
+{
+    return (unsigned int)atomic_read(&nr_shared_mfns);
+}
+
 int mem_sharing_sharing_resume(struct domain *d)
 {
     mem_event_response_t rsp;
@@ -570,10 +587,11 @@ int mem_sharing_share_pages(struct domai
         BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn) == 0);
         if ( single_client_gfn )
         {
-            /* Only increase the per-domain count when we are actually
+            /* Only increase the stats counts when we are actually
              * sharing. And don''t increase it should we ever re-share
*/
             atomic_inc(&d->shr_pages);
             ASSERT( cd == d );
+            atomic_inc(&nr_shared_mfns);
         }
         put_domain(d);
     }
@@ -654,10 +672,14 @@ gfn_found:
     if ( flags & MEM_SHARING_DESTROY_GFN )
     {
         mem_sharing_gfn_destroy(gfn_info, !last_gfn);
-        if ( !last_gfn ) 
+        if ( last_gfn )
+        {
+            atomic_dec(&nr_shared_mfns);
+        } else {
             /* Even though we don''t allocate a private page, we have
to account
              * for the MFN that originally backed this PFN. */
             atomic_dec(&nr_saved_mfns);
+        }
 
         if ( last_gfn )
         {
@@ -707,8 +729,11 @@ gfn_found:
     put_page_and_type(old_page);
 
 private_page_found:    
-    if ( !last_gfn ) 
+    if ( last_gfn ) {
+        atomic_dec(&nr_shared_mfns);
+    } else {
         atomic_dec(&nr_saved_mfns);
+    }
 
     if ( p2m_change_type(d, gfn, p2m_ram_shared, p2m_ram_rw) != 
                                                 p2m_ram_shared ) 
diff -r 24d514cd4dee -r 65b32b391373 xen/arch/x86/x86_64/compat/mm.c
--- a/xen/arch/x86/x86_64/compat/mm.c
+++ b/xen/arch/x86/x86_64/compat/mm.c
@@ -205,6 +205,12 @@ int compat_arch_memory_op(int op, XEN_GU
         break;
     }
 
+    case XENMEM_get_sharing_freed_pages:
+        return mem_sharing_get_nr_saved_mfns();
+
+    case XENMEM_get_sharing_shared_pages:
+        return mem_sharing_get_nr_shared_mfns();
+
     default:
         rc = -ENOSYS;
         break;
diff -r 24d514cd4dee -r 65b32b391373 xen/arch/x86/x86_64/mm.c
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -34,6 +34,7 @@
 #include <asm/msr.h>
 #include <asm/setup.h>
 #include <asm/numa.h>
+#include <asm/mem_sharing.h>
 #include <public/memory.h>
 
 /* Parameters for PFN/MADDR compression. */
@@ -1090,6 +1091,12 @@ long subarch_memory_op(int op, XEN_GUEST
 
         break;
 
+    case XENMEM_get_sharing_freed_pages:
+        return mem_sharing_get_nr_saved_mfns();
+
+    case XENMEM_get_sharing_shared_pages:
+        return mem_sharing_get_nr_shared_mfns();
+
     default:
         rc = -ENOSYS;
         break;
diff -r 24d514cd4dee -r 65b32b391373 xen/include/asm-x86/mem_sharing.h
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -40,6 +40,7 @@ struct page_sharing_info
     (is_hvm_domain(_d) && paging_mode_hap(_d)) 
 
 unsigned int mem_sharing_get_nr_saved_mfns(void);
+unsigned int mem_sharing_get_nr_shared_mfns(void);
 int mem_sharing_nominate_page(struct domain *d, 
                               unsigned long gfn,
                               int expected_refcnt,
diff -r 24d514cd4dee -r 65b32b391373 xen/include/public/memory.h
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -294,6 +294,7 @@ typedef struct xen_pod_target xen_pod_ta
  * The call never fails. 
  */
 #define XENMEM_get_sharing_freed_pages    18
+#define XENMEM_get_sharing_shared_pages   19
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

head link

[PATCH 10 of 18] Tools: Expose to libxc the total number of shared frames and space saved

tools/libxc/xc_private.c |  10 ++++++++++
 tools/libxc/xenctrl.h    |   4 ++++
 2 files changed, 14 insertions(+), 0 deletions(-)


Signed-off-by: Adin Scannell <adin@scannell.ca>

diff -r 65b32b391373 -r 6f489a294a76 tools/libxc/xc_private.c
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -533,6 +533,16 @@ long xc_maximum_ram_page(xc_interface *x
     return do_memory_op(xch, XENMEM_maximum_ram_page, NULL, 0);
 }
 
+long xc_sharing_freed_pages(xc_interface *xch)
+{
+    return do_memory_op(xch, XENMEM_get_sharing_freed_pages, NULL, 0);
+}
+
+long xc_sharing_used_frames(xc_interface *xch)
+{
+    return do_memory_op(xch, XENMEM_get_sharing_shared_pages, NULL, 0);
+}
+
 long long xc_domain_get_cpu_usage( xc_interface *xch, domid_t domid, int vcpu )
 {
     DECLARE_DOMCTL;
diff -r 65b32b391373 -r 6f489a294a76 tools/libxc/xenctrl.h
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1227,6 +1227,10 @@ int xc_mmuext_op(xc_interface *xch, stru
 /* System wide memory properties */
 long xc_maximum_ram_page(xc_interface *xch);
 
+long xc_sharing_freed_pages(xc_interface *xch);
+
+long xc_sharing_used_frames(xc_interface *xch);
+
 /* Get current total pages allocated to a domain. */
 long xc_get_tot_pages(xc_interface *xch, uint32_t domid);

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

head link

[PATCH 11 of 18] Tools: Allow libxl/xl to expose the total number of shared frames and space saved

tools/libxl/Makefile     |   2 +-
 tools/libxl/xl_cmdimpl.c |  13 ++++++++++---
 2 files changed, 11 insertions(+), 4 deletions(-)


Signed-off-by: Adin Scannell <adin@scannell.ca>

diff -r 6f489a294a76 -r cde3529132c1 tools/libxl/Makefile
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -89,7 +89,7 @@ libxl_internal.h: _libxl_types_internal.
 libxl_internal_json.h: _libxl_types_internal_json.h
 
 $(LIBXL_OBJS) $(LIBXLU_OBJS) $(XL_OBJS): libxl.h
-$(LIBXL_OBJS): libxl_internal.h
+$(LIBXL_OBJS) $(XL_OBJS): libxl_internal.h
 
 _libxl_type%.h _libxl_type%_json.h _libxl_type%.c: libxl_type%.idl gentypes.py
libxltypes.py
 	$(PYTHON) gentypes.py libxl_type$*.idl __libxl_type$*.h __libxl_type$*_json.h
__libxl_type$*.c
diff -r 6f489a294a76 -r cde3529132c1 tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -38,6 +38,7 @@
 
 #include "libxl.h"
 #include "libxl_utils.h"
+#include "libxl_internal.h"
 #include "libxlutil.h"
 #include "xl.h"
 
@@ -3758,6 +3759,7 @@ int main_info(int argc, char **argv)
 static void sharing(int totals, const libxl_dominfo *info, int nb_domain)
 {
     int i;
+    const libxl_version_info *vinfo;
 
     printf("Name                                        ID   Mem
Shared\n");
 
@@ -3774,9 +3776,14 @@ static void sharing(int totals, const li
         free(domname);
     }
 
-    if (totals)
-    {
-        /* To be added with a future patch. */
+    if (totals) {
+        vinfo = libxl_get_version_info(ctx);
+        if (vinfo) {
+            i = (1 << 20) / vinfo->pagesize;
+            printf("Freed with sharing: %ld  Total physical shared:
%ld\n",
+            xc_sharing_freed_pages(ctx->xch) / i,
+            xc_sharing_used_frames(ctx->xch) / i);
+        }
     }
 }

Andres Lagar-Cavilla

2011-Dec-08 07:47 UTC

head link

[PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

xen/arch/x86/mm.c        |  4 ++--
 xen/include/asm-x86/mm.h |  3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)


This is necessary for a new consumer of page_lock/unlock to follow in
the series.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r cde3529132c1 -r ecf95feef9ac xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1718,7 +1718,7 @@ static int free_l4_table(struct page_inf
 #define free_l4_table(page, preemptible) (-EINVAL)
 #endif
 
-static int page_lock(struct page_info *page)
+int page_lock(struct page_info *page)
 {
     unsigned long x, nx;
 
@@ -1735,7 +1735,7 @@ static int page_lock(struct page_info *p
     return 1;
 }
 
-static void page_unlock(struct page_info *page)
+void page_unlock(struct page_info *page)
 {
     unsigned long x, nx, y = page->u.inuse.type_info;
 
diff -r cde3529132c1 -r ecf95feef9ac xen/include/asm-x86/mm.h
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -337,6 +337,9 @@ int is_iomem_page(unsigned long mfn);
 
 void clear_superpage_mark(struct page_info *page);
 
+int page_lock(struct page_info *page);
+void page_unlock(struct page_info *page);
+
 struct domain *page_get_owner_and_reference(struct page_info *page);
 void put_page(struct page_info *page);
 int  get_page(struct page_info *page, struct domain *domain);

Tim Deegan

2011-Dec-08 11:11 UTC

head link

Re: [PATCH 01 of 18] x86/mm: Code style fixes in mem_sharing.c

At 02:47 -0500 on 08 Dec (1323312436), Andres Lagar-Cavilla
wrote:> Harmless patch, with no functional changes, only code style issues.
> 
> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
...
> -    return -2;
> +    return 0;
Ahem. :)

Tim.

Tim Deegan

2011-Dec-08 11:16 UTC

head link

Re: [PATCH 02 of 18] x86/mm: Making a page sharable sets PGT_validated, but making a page private doesn''t expect it

At 02:47 -0500 on 08 Dec (1323312437), Andres Lagar-Cavilla
wrote:> diff -r aeebbff899ff -r 61da3fc46f06 xen/arch/x86/mm.c
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4347,7 +4347,7 @@ int page_make_private(struct domain *d, 
>  
>      /* We can only change the type if count is one */
>      if ( (page->u.inuse.type_info & (PGT_type_mask |
PGT_count_mask))
> -         != (PGT_shared_page | 1) )
> +         != (PGT_shared_page | PGT_validated | 1) )
This seems wrong - surely PGT_validated isn''t in (PGT_type_mask |
PGT_count_mask).

Tim.

Andres Lagar-Cavilla

2011-Dec-08 16:16 UTC

head link

Re: [PATCH 01 of 18] x86/mm: Code style fixes in mem_sharing.c

> At 02:47 -0500 on 08 Dec (1323312436), Andres Lagar-Cavilla wrote:
>> Harmless patch, with no functional changes, only code style issues.
>>
>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>
> ...
>
>> -    return -2;
>> +    return 0;
>
> Ahem. :)
>
> Tim.Borderline, I''ll admit. I can play down the tone of the header --
nothing
is really harmless :)

The function will still produce the same outputs on the same inputs, with
cleaner code.

Thanks
Andres>

Tim Deegan

2011-Dec-08 21:54 UTC

head link

Re: [PATCH 01 of 18] x86/mm: Code style fixes in mem_sharing.c

At 08:16 -0800 on 08 Dec (1323332187), Andres Lagar-Cavilla
wrote:> > At 02:47 -0500 on 08 Dec (1323312436), Andres Lagar-Cavilla wrote:
> >> Harmless patch, with no functional changes, only code style
issues.
> >>
> >> Signed-off-by: Andres Lagar-Cavilla
<andres@lagarcavilla.org>
> >
> > ...
> >
> >> -    return -2;
> >> +    return 0;
> >
> > Ahem. :)
> >
> > Tim.
> Borderline, I''ll admit. I can play down the tone of the header --
nothing
> is really harmless :)
> 
> The function will still produce the same outputs on the same inputs, with
> cleaner code.
Ah, so it will.  I''ll apply this, but in future I''d like long
patches
that do tedious whitespace fixups to _just_ have whitespace fixups. 
It reduces the chance of nasty surprises (and the chance that I''ll have
to read a long patch of tedious whitespace fixups more than once!)

Tim.

Tim Deegan

2011-Dec-08 22:13 UTC

head link

Re: [PATCH 03 of 18] x86/mm: Eliminate hash table in sharing code as index of shared mfns

At 02:47 -0500 on 08 Dec (1323312438), Andres Lagar-Cavilla
wrote:>  xen/arch/x86/mm/mem_sharing.c     |  526
+++++++++++++++++++------------------
>  xen/include/asm-x86/mem_sharing.h |   15 +-
>  xen/include/asm-x86/mm.h          |   11 +-
>  xen/include/public/domctl.h       |    3 +
>  4 files changed, 296 insertions(+), 259 deletions(-)
> 
> 
> The hashtable mechanism used by the sharing code is a bit odd.  It seems
> unnecessary and adds complexity.  Instead, we replace this by storing a
list
> head directly in the page info for the case when the page is shared.  This
does
> not add any extra space to the page_info and serves to remove significant
> complexity from sharing.
> 
> Signed-off-by: Adin Scannell <adin@scannell.ca>
> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
I like the look of this but I won''t apply until Olaf is happy with the
interface changes.

One nit: 
> diff -r 61da3fc46f06 -r 3038770886aa xen/include/asm-x86/mem_sharing.h
> --- a/xen/include/asm-x86/mem_sharing.h
> +++ b/xen/include/asm-x86/mem_sharing.h
> @@ -22,13 +22,23 @@
>  #ifndef __MEM_SHARING_H__
>  #define __MEM_SHARING_H__
>  
> +#include <public/domctl.h>
> +
> +typedef uint64_t shr_handle_t; 
> +
> +struct page_sharing_info
> +{
> +    struct page_info *pg;   /* Back pointer to the page. */
> +    shr_handle_t handle;    /* Globally unique version / handle. */
> +    struct list_head entry; /* List of all shared pages (entry). */
IIUC this list is only used if the sharing audit code is anabled, so
it should probably be appropritaely #ifdeffed out to save space in the
non-audit case. 

Cheers,

Tim.
> +    struct list_head gfns;  /* List of domains and gfns for this page
(head). */
> +};
> +
>  #ifdef __x86_64__
>

Tim Deegan

2011-Dec-08 22:20 UTC

head link

Re: [PATCH 04 of 18] x86/mm: Update mem sharing interface to (re)allow sharing of grants

Hi, 

At 02:47 -0500 on 08 Dec (1323312439), Andres Lagar-Cavilla
wrote:> +            if (
XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.client_gfn) )
> +            {
> +                grant_ref_t gref = (grant_ref_t) 
> +                                   
(XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF(
> +                                        mec->u.share.client_gfn));
> +                if ( (!source_is_gref) || 
> +                     (mem_sharing_gref_to_gfn(cd, gref, &cgfn) < 0)
)
> +                {
> +                    put_domain(cd);
> +                    return -EINVAL;
> +                }
> +            } else {
> +                if ( source_is_gref )
> +                {
> +                    put_domain(cd);
> +                    return -EINVAL;
> +                }
Why can''t one input be a grant and the other not?  If there''s
a problem
with that it should probably be documented in the hypercall interface. 

Cheers,

Tim.

Tim Deegan

2011-Dec-08 22:27 UTC

head link

Re: [PATCH 09 of 18] x86/mm: Check how many mfns are shared, in addition to how many are saved

At 02:47 -0500 on 08 Dec (1323312444), Andres Lagar-Cavilla
wrote:> This patch also moves the existing sharing-related memory op to the
> correct location, and adds logic to the audit() method that uses the
> new information.
> 
> This patch only provides the Xen implementation of the domctls.
> 
> Signed-off-by: Adin Scannell <adin@scannell.ca>
Acked-by: Tim Deegan <tim@xen.org>  (once patches #3 and #4 are in).

Tim.

Tim Deegan

2011-Dec-08 22:38 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla
wrote:> This is necessary for a new consumer of page_lock/unlock to follow in
> the series.
> 
> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Nak, I''m afraid. 

These were OK as local functions but if they''re going to be made
generally visible, they need clear comments describing what this
locking protects and what the discipline is for avoiding deadlocks.

Perhaps Jan or Keir can supply appropriate words.  The locking was
introduce in this cset:

    changeset:   17846:09dd5999401b
    user:        Keir Fraser <keir.fraser@citrix.com>
    date:        Thu Jun 12 18:14:00 2008 +0100
    files:       xen/arch/x86/domain.c xen/arch/x86/domain_build.c
    xen/arch/x86/mm.c
    description:
    x86: remove use of per-domain lock from page table entry handling
    
    This change results in a 5% performance improvement for kernel builds
    on dual-socket quad-core systems (which is what I used for reference
    for both 32- and 64-bit). Along with that, the amount of time reported
    as spent in the kernel gets reduced by almost 25% (the fraction of
    time spent in the kernel is generally reported significantly higher
    under Xen than with a native kernel).
    
    Signed-off-by: Jan Beulich <jbeulich@novell.com>
    Signed-off-by: Keir Fraser <keir.fraser@citrix.com>    

Tim.

Keir Fraser

2011-Dec-08 23:06 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

On 08/12/2011 22:38, "Tim Deegan" <tim@xen.org> wrote:
> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote:
>> This is necessary for a new consumer of page_lock/unlock to follow in
>> the series.
>> 
>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> 
> Nak, I''m afraid. 
> 
> These were OK as local functions but if they''re going to be made
> generally visible, they need clear comments describing what this
> locking protects and what the discipline is for avoiding deadlocks.
> 
> Perhaps Jan or Keir can supply appropriate words.  The locking was
> introduce in this cset:
It''s Jan''s work originally, but the basic intention of
page_lock is to
serialise pte updates. To aid with this, a page''s type cannot change
while
its lock is held. No lock nests inside a page lock (not even other page
locks) so there is no deadlock risk.
>     changeset:   17846:09dd5999401b
>     user:        Keir Fraser <keir.fraser@citrix.com>
>     date:        Thu Jun 12 18:14:00 2008 +0100
>     files:       xen/arch/x86/domain.c xen/arch/x86/domain_build.c
>     xen/arch/x86/mm.c
>     description:
>     x86: remove use of per-domain lock from page table entry handling
>     
>     This change results in a 5% performance improvement for kernel builds
>     on dual-socket quad-core systems (which is what I used for reference
>     for both 32- and 64-bit). Along with that, the amount of time reported
>     as spent in the kernel gets reduced by almost 25% (the fraction of
>     time spent in the kernel is generally reported significantly higher
>     under Xen than with a native kernel).
>     
>     Signed-off-by: Jan Beulich <jbeulich@novell.com>
>     Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
> 
> Tim.

Andres Lagar-Cavilla

2011-Dec-09 02:54 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote:
>> This is necessary for a new consumer of page_lock/unlock to follow in
>> the series.
>>
>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>
> Nak, I''m afraid.
>
> These were OK as local functions but if they''re going to be made
> generally visible, they need clear comments describing what this
> locking protects and what the discipline is for avoiding deadlocks.
How about something along the lines of
"page_lock() is used for two purposes: pte serialization, and memory
sharing. All users of page lock for pte serialization live in mm.c, use it
to lock a page table page during pte updates, do not take other locks
within the critical section delimited by page_lock/unlock, and perform no
nesting. All users of page lock for memory sharing live in
mm/mem_sharing.c. For memory sharing, nesting may happen when sharing (and
locking) two pages -- deadlock is avoided by locking pages in increasing
order. Memory sharing may take the p2m_lock within a page_lock/unlock
critical section. These two users (pte serialization and memory sharing)
should never collide, as long as page table pages are properly unshared
prior to updating."

Now those are all pretty words, but here are the two things I (think I)
need to do:
- Prior to updating pte''s, we do get_gfn on the page table page. We
should
be using get_gfn_unshare. Regardless of this patch. It''s likely never
going to trigger an actual unshare, yet better safe than sorry.
- I can wrap uses of page_lock in mm sharing in an "external"
order-enforcing construct from mm-locks.h. And use that to scream deadlock
between page_lock and p2m_lock.

The code that actually uses page_lock()s in the memory sharing code can be
found in "[PATCH] x86/mm: Eliminate global lock in mem sharing code".
It
already orders locking of individual pages in ascending order.

Andres
>
> Perhaps Jan or Keir can supply appropriate words.  The locking was
> introduce in this cset:
>
>     changeset:   17846:09dd5999401b
>     user:        Keir Fraser <keir.fraser@citrix.com>
>     date:        Thu Jun 12 18:14:00 2008 +0100
>     files:       xen/arch/x86/domain.c xen/arch/x86/domain_build.c
>     xen/arch/x86/mm.c
>     description:
>     x86: remove use of per-domain lock from page table entry handling
>
>     This change results in a 5% performance improvement for kernel builds
>     on dual-socket quad-core systems (which is what I used for reference
>     for both 32- and 64-bit). Along with that, the amount of time reported
>     as spent in the kernel gets reduced by almost 25% (the fraction of
>     time spent in the kernel is generally reported significantly higher
>     under Xen than with a native kernel).
>
>     Signed-off-by: Jan Beulich <jbeulich@novell.com>
>     Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
>
> Tim.
>

Andres Lagar-Cavilla

2011-Dec-09 02:57 UTC

head link

Re: [PATCH 04 of 18] x86/mm: Update mem sharing interface to (re)allow sharing of grants

> Hi,
>
> At 02:47 -0500 on 08 Dec (1323312439), Andres Lagar-Cavilla wrote:
>> +            if (
>> XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.client_gfn) )
>> +            {
>> +                grant_ref_t gref = (grant_ref_t)
>> +
>> (XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF(
>> +                                        mec->u.share.client_gfn));
>> +                if ( (!source_is_gref) ||
>> +                     (mem_sharing_gref_to_gfn(cd, gref, &cgfn)
< 0) )
>> +                {
>> +                    put_domain(cd);
>> +                    return -EINVAL;
>> +                }
>> +            } else {
>> +                if ( source_is_gref )
>> +                {
>> +                    put_domain(cd);
>> +                    return -EINVAL;
>> +                }
>
> Why can''t one input be a grant and the other not?  If
there''s a problem
> with that it should probably be documented in the hypercall interface.
>No particular problem. The gref use case is the current target of
tools/memshr, which always does gref-to-gref sharing. I imagine
gref-to-gfn sharing to be extremely unlikely.

But there is no need to disallow it.

Andres> Cheers,
>
> Tim.
>

Andres Lagar-Cavilla

2011-Dec-09 03:01 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

> On 08/12/2011 22:38, "Tim Deegan" <tim@xen.org> wrote:
>
>> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote:
>>> This is necessary for a new consumer of page_lock/unlock to follow
in
>>> the series.
>>>
>>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>>
>> Nak, I''m afraid.
>>
>> These were OK as local functions but if they''re going to be
made
>> generally visible, they need clear comments describing what this
>> locking protects and what the discipline is for avoiding deadlocks.
>>
>> Perhaps Jan or Keir can supply appropriate words.  The locking was
>> introduce in this cset:
>
> It''s Jan''s work originally, but the basic intention of
page_lock is to
> serialise pte updates. To aid with this, a page''s type cannot
change while
> its lock is held.That''s definitely a property I want to leverage.
> No lock nests inside a page lock (not even other page
> locks) so there is no deadlock risk.There''s no way to not nest when sharing two pages, but I always make
sure
I lock in ascending order.

Thanks,
Andres>
>>     changeset:   17846:09dd5999401b
>>     user:        Keir Fraser <keir.fraser@citrix.com>
>>     date:        Thu Jun 12 18:14:00 2008 +0100
>>     files:       xen/arch/x86/domain.c xen/arch/x86/domain_build.c
>>     xen/arch/x86/mm.c
>>     description:
>>     x86: remove use of per-domain lock from page table entry handling
>>
>>     This change results in a 5% performance improvement for kernel
>> builds
>>     on dual-socket quad-core systems (which is what I used for
reference
>>     for both 32- and 64-bit). Along with that, the amount of time
>> reported
>>     as spent in the kernel gets reduced by almost 25% (the fraction of
>>     time spent in the kernel is generally reported significantly higher
>>     under Xen than with a native kernel).
>>
>>     Signed-off-by: Jan Beulich <jbeulich@novell.com>
>>     Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
>>
>> Tim.
>
>
>

Keir Fraser

2011-Dec-09 08:17 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

On 09/12/2011 03:01, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org> wrote:
>> On 08/12/2011 22:38, "Tim Deegan" <tim@xen.org> wrote:
>> 
>>> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote:
>>>> This is necessary for a new consumer of page_lock/unlock to
follow in
>>>> the series.
>>>> 
>>>> Signed-off-by: Andres Lagar-Cavilla
<andres@lagarcavilla.org>
>>> 
>>> Nak, I''m afraid.
>>> 
>>> These were OK as local functions but if they''re going to
be made
>>> generally visible, they need clear comments describing what this
>>> locking protects and what the discipline is for avoiding deadlocks.
>>> 
>>> Perhaps Jan or Keir can supply appropriate words.  The locking was
>>> introduce in this cset:
>> 
>> It''s Jan''s work originally, but the basic intention
of page_lock is to
>> serialise pte updates. To aid with this, a page''s type cannot
change while
>> its lock is held.
> That''s definitely a property I want to leverage.
> 
>> No lock nests inside a page lock (not even other page
>> locks) so there is no deadlock risk.
> There''s no way to not nest when sharing two pages, but I always
make sure
> I lock in ascending order.
The fact there is currently no nesting gives you some freedom. Ordered
locking of other page locks is obviously going to be safe. So is taking a
page lock inside any other lock. Taking some other lock inside a page lock
is all that needs care, but there probably aren''t many locks that
currently
can have page locks nested inside them (and hence you would risk deadlock by
nesting the other way).

 -- Keir
> Thanks,
> Andres
>> 
>>>     changeset:   17846:09dd5999401b
>>>     user:        Keir Fraser <keir.fraser@citrix.com>
>>>     date:        Thu Jun 12 18:14:00 2008 +0100
>>>     files:       xen/arch/x86/domain.c xen/arch/x86/domain_build.c
>>>     xen/arch/x86/mm.c
>>>     description:
>>>     x86: remove use of per-domain lock from page table entry
handling
>>> 
>>>     This change results in a 5% performance improvement for kernel
>>> builds
>>>     on dual-socket quad-core systems (which is what I used for
reference
>>>     for both 32- and 64-bit). Along with that, the amount of time
>>> reported
>>>     as spent in the kernel gets reduced by almost 25% (the fraction
of
>>>     time spent in the kernel is generally reported significantly
higher
>>>     under Xen than with a native kernel).
>>> 
>>>     Signed-off-by: Jan Beulich <jbeulich@novell.com>
>>>     Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
>>> 
>>> Tim.
>> 
>> 
>> 
> 
>

Jan Beulich

2011-Dec-09 08:51 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

>>> On 09.12.11 at 03:54, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org> wrote:
>>  At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote:
>>> This is necessary for a new consumer of page_lock/unlock to follow
in
>>> the series.
>>>
>>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>>
>> Nak, I''m afraid.
>>
>> These were OK as local functions but if they''re going to be
made
>> generally visible, they need clear comments describing what this
>> locking protects and what the discipline is for avoiding deadlocks.
> 
> How about something along the lines of
> "page_lock() is used for two purposes: pte serialization, and memory
> sharing. All users of page lock for pte serialization live in mm.c, use it
> to lock a page table page during pte updates, do not take other locks
> within the critical section delimited by page_lock/unlock, and perform no
> nesting. All users of page lock for memory sharing live in
> mm/mem_sharing.c. For memory sharing, nesting may happen when sharing (and
> locking) two pages -- deadlock is avoided by locking pages in increasing
> order. Memory sharing may take the p2m_lock within a page_lock/unlock
> critical section. These two users (pte serialization and memory sharing)
> should never collide, as long as page table pages are properly unshared
> prior to updating."
This would seem to me like very undesirable lock ordering - a very
coarse (per-domain) lock taken inside a very fine grained (per-page)
one.
> Now those are all pretty words, but here are the two things I (think I)
> need to do:
> - Prior to updating pte''s, we do get_gfn on the page table page.
We should
> be using get_gfn_unshare. Regardless of this patch. It''s likely
never
> going to trigger an actual unshare, yet better safe than sorry.
Does memory sharing work on pv domains at all?
> - I can wrap uses of page_lock in mm sharing in an "external"
> order-enforcing construct from mm-locks.h. And use that to scream deadlock
> between page_lock and p2m_lock.
> 
> The code that actually uses page_lock()s in the memory sharing code can be
> found in "[PATCH] x86/mm: Eliminate global lock in mem sharing
code". It
> already orders locking of individual pages in ascending order.
It should be this patch to make the functions externally visible, not a
separate one (or at the very minimum the two ought to be in the same
series, back to back). Which is not to say that I''m fully convinced
this
is a good idea.

Jan

Ian Jackson

2011-Dec-09 09:59 UTC

head link

Re: [PATCH 06 of 18] Tools: Update libxc mem sharing interface

Andres Lagar-Cavilla writes ("[PATCH 06 of 18] Tools: Update libxc mem
sharing interface"):>  tools/libxc/xc_memshr.c |  19 ++++++++++++++++++-
>  tools/libxc/xenctrl.h   |   7 ++++++-
>  2 files changed, 24 insertions(+), 2 deletions(-)
> 
> 
> Previosuly, the mem sharing code would return an opaque handle
> to index shared pages (and nominees) in its global hash table.
> By removing the hash table, the handle becomes a version, to
> avoid sharing a stale version of a page. Thus, libxc wrappers
> need to be updated accordingly.
> 
> Signed-off-by: Adin Scannell <adin@scannell.ca>
> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

(assuming the hypervisor changes are OK, obviously)

Ian.

Ian Jackson

2011-Dec-09 09:59 UTC

head link

Re: [PATCH 07 of 18] Tools: Update memshr tool to use new sharing API

Andres Lagar-Cavilla writes ("[PATCH 07 of 18] Tools: Update memshr tool to
use new sharing API"):> The only (in-tree, that we know of) consumer of the mem sharing API
> is the memshr tool (conditionally linked into blktap2). Update it to
> use the new API.
> 
> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian Jackson

2011-Dec-09 10:01 UTC

head link

Re: [PATCH 05 of 18] Tools: Do not assume sharing target is dom0 in libxc wrappers

Andres Lagar-Cavilla writes ("[PATCH 05 of 18] Tools: Do not assume sharing
target is dom0 in libxc wrappers"):> The libxc wrapper that shares two pages always assumed one
> of the pages was mapped by dom0. Expand the API to specify
> both sharing domains.
> 
> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian Jackson

2011-Dec-09 10:02 UTC

head link

Re: [PATCH 11 of 18] Tools: Allow libxl/xl to expose the total number of shared frames and space saved

Andres Lagar-Cavilla writes ("[PATCH 11 of 18] Tools: Allow libxl/xl to
expose the total number of shared frames and space
saved"):> diff -r 6f489a294a76 -r cde3529132c1 tools/libxl/xl_cmdimpl.c
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -38,6 +38,7 @@
>  
>  #include "libxl.h"
>  #include "libxl_utils.h"
> +#include "libxl_internal.h"
No, this is not correct.  xl should not include libxl_internal.h.  Nor
should it make libxc calls directly.

Ian.

Ian Jackson

2011-Dec-09 10:08 UTC

head link

Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages

Andres Lagar-Cavilla writes ("[PATCH 08 of 18] Tools: Add a sharing command
to xl for information about shared pages"):> +static void sharing(int totals, const libxl_dominfo *info, int nb_domain)
> +{
> +    int i;
> +
> +    printf("Name                                        ID   Mem
Shared\n");
> +
> +    for (i = 0; i < nb_domain; i++) {
> +        char *domname;
> +        unsigned shutdown_reason;
> +        domname = libxl_domid_to_name(ctx, info[i].domid);
> +        shutdown_reason = info[i].shutdown ? info[i].shutdown_reason : 0;
> +        printf("%-40s %5d %5lu  %5lu\n",
> +                domname,
> +                info[i].domid,
> +                (unsigned long) (info[i].current_memkb / 1024),
> +                (unsigned long) (info[i].shared_memkb / 1024));
> +        free(domname);
> +    }
> +
> +    if (totals)
> +    {
> +        /* To be added with a future patch. */
> +    }
> +}
Is this analogous to an "xm sharing" command ?

Perhaps we should try to integrate this information in the output of
"xl list" (although we do have some backward compatibility issues
there).

Maybe we need to invent "xl llist" or "xl list -v" or
something.

Ian.

Ian Campbell

2011-Dec-09 10:10 UTC

head link

Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages

On Thu, 2011-12-08 at 07:47 +0000, Andres Lagar-Cavilla
wrote:> diff -r 8d2a8094ace5 -r 24d514cd4dee tools/libxl/xl_cmdtable.c
> --- a/tools/libxl/xl_cmdtable.c
> +++ b/tools/libxl/xl_cmdtable.c
> @@ -189,6 +189,12 @@ struct cmd_spec cmd_table[] = {
>        "Get information about Xen host",
>        "-n, --numa         List host NUMA topology information",
>      },
> +    { "sharing",
> +      &main_sharing, 0,
> +      "Get information about page sharing",
> +      "[options] [Domain]", 
> +      "-t, --totals              Include host totals in the
output",
> +    },
>      { "sched-credit",
>        &main_sched_credit, 0,
>        "Get/set credit scheduler parameters",
Please also patch docs/man/xl.pod.1 when adding new commands.

Ian.

Ian Jackson

2011-Dec-09 11:29 UTC

head link

Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages

Ian Campbell writes ("Re: [PATCH 08 of 18] Tools: Add a sharing command to
xl for information about shared pages"):> Please also patch docs/man/xl.pod.1 when adding new commands.
Oh, good point, yes.

Ian.

Andres Lagar-Cavilla

2011-Dec-09 14:43 UTC

head link

Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages

> Andres Lagar-Cavilla writes ("[PATCH 08 of 18] Tools: Add a sharing
> command to xl for information about shared pages"):
>> +static void sharing(int totals, const libxl_dominfo *info, int
>> nb_domain)
>> +{
>> +    int i;
>> +
>> +    printf("Name                                        ID   Mem
>> Shared\n");
>> +
>> +    for (i = 0; i < nb_domain; i++) {
>> +        char *domname;
>> +        unsigned shutdown_reason;
>> +        domname = libxl_domid_to_name(ctx, info[i].domid);
>> +        shutdown_reason = info[i].shutdown ? info[i].shutdown_reason :
>> 0;
>> +        printf("%-40s %5d %5lu  %5lu\n",
>> +                domname,
>> +                info[i].domid,
>> +                (unsigned long) (info[i].current_memkb / 1024),
>> +                (unsigned long) (info[i].shared_memkb / 1024));
>> +        free(domname);
>> +    }
>> +
>> +    if (totals)
>> +    {
>> +        /* To be added with a future patch. */
>> +    }
>> +}
>
> Is this analogous to an "xm sharing" command ?
>
> Perhaps we should try to integrate this information in the output of
> "xl list" (although we do have some backward compatibility issues
> there).
>
> Maybe we need to invent "xl llist" or "xl list -v" or
something.I seem to recall that several weeks ago, when Adin first posted this
patch, he was asked to not break xl list compatibility with xm list. Hence
the separate command.

Will also try to add doc to xl.pod.1

Thanks
Andres>
> Ian.
>

Andres Lagar-Cavilla

2011-Dec-09 14:47 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

> On 09/12/2011 03:01, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org>
> wrote:
>
>>> On 08/12/2011 22:38, "Tim Deegan" <tim@xen.org>
wrote:
>>>
>>>> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla
wrote:
>>>>> This is necessary for a new consumer of page_lock/unlock to
follow in
>>>>> the series.
>>>>>
>>>>> Signed-off-by: Andres Lagar-Cavilla
<andres@lagarcavilla.org>
>>>>
>>>> Nak, I''m afraid.
>>>>
>>>> These were OK as local functions but if they''re going
to be made
>>>> generally visible, they need clear comments describing what
this
>>>> locking protects and what the discipline is for avoiding
deadlocks.
>>>>
>>>> Perhaps Jan or Keir can supply appropriate words.  The locking
was
>>>> introduce in this cset:
>>>
>>> It''s Jan''s work originally, but the basic
intention of page_lock is to
>>> serialise pte updates. To aid with this, a page''s type
cannot change
>>> while
>>> its lock is held.
>> That''s definitely a property I want to leverage.
>>
>>> No lock nests inside a page lock (not even other page
>>> locks) so there is no deadlock risk.
>> There''s no way to not nest when sharing two pages, but I
always make
>> sure
>> I lock in ascending order.
>
> The fact there is currently no nesting gives you some freedom. Ordered
> locking of other page locks is obviously going to be safe. So is taking a
> page lock inside any other lock. Taking some other lock inside a page lock
> is all that needs care, but there probably aren''t many locks that
> currently
> can have page locks nested inside them (and hence you would risk deadlock
> by
> nesting the other way).For now, p2m_lock will be taken inside page_lock, to type p2m entries or
make them point to the shared page. Later, when p2m lookups become
synchronized, p2m_lock (or similar) will be taken before page_lock, in
get_gfn* -- and that will be better.

Either way, I will use mm-locks.h to make sure that if there is any chance
of deadlock the hypervisor screams.

Thanks
Andres>
>  -- Keir
>
>> Thanks,
>> Andres
>>>
>>>>     changeset:   17846:09dd5999401b
>>>>     user:        Keir Fraser <keir.fraser@citrix.com>
>>>>     date:        Thu Jun 12 18:14:00 2008 +0100
>>>>     files:       xen/arch/x86/domain.c
xen/arch/x86/domain_build.c
>>>>     xen/arch/x86/mm.c
>>>>     description:
>>>>     x86: remove use of per-domain lock from page table entry
handling
>>>>
>>>>     This change results in a 5% performance improvement for
kernel
>>>> builds
>>>>     on dual-socket quad-core systems (which is what I used for
>>>> reference
>>>>     for both 32- and 64-bit). Along with that, the amount of
time
>>>> reported
>>>>     as spent in the kernel gets reduced by almost 25% (the
fraction of
>>>>     time spent in the kernel is generally reported
significantly
>>>> higher
>>>>     under Xen than with a native kernel).
>>>>
>>>>     Signed-off-by: Jan Beulich <jbeulich@novell.com>
>>>>     Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
>>>>
>>>> Tim.
>>>
>>>
>>>
>>
>>
>
>
>

Andres Lagar-Cavilla

2011-Dec-09 14:53 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

>>>> On 09.12.11 at 03:54, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org>
>>>> wrote:
>>>  At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote:
>>>> This is necessary for a new consumer of page_lock/unlock to
follow in
>>>> the series.
>>>>
>>>> Signed-off-by: Andres Lagar-Cavilla
<andres@lagarcavilla.org>
>>>
>>> Nak, I''m afraid.
>>>
>>> These were OK as local functions but if they''re going to
be made
>>> generally visible, they need clear comments describing what this
>>> locking protects and what the discipline is for avoiding deadlocks.
>>
>> How about something along the lines of
>> "page_lock() is used for two purposes: pte serialization, and
memory
>> sharing. All users of page lock for pte serialization live in mm.c, use
>> it
>> to lock a page table page during pte updates, do not take other locks
>> within the critical section delimited by page_lock/unlock, and perform
>> no
>> nesting. All users of page lock for memory sharing live in
>> mm/mem_sharing.c. For memory sharing, nesting may happen when sharing
>> (and
>> locking) two pages -- deadlock is avoided by locking pages in
increasing
>> order. Memory sharing may take the p2m_lock within a page_lock/unlock
>> critical section. These two users (pte serialization and memory
sharing)
>> should never collide, as long as page table pages are properly unshared
>> prior to updating."
>
> This would seem to me like very undesirable lock ordering - a very
> coarse (per-domain) lock taken inside a very fine grained (per-page)
> one.I''m not sure I follow. Unshare would do its work, and then pte
serialization would start. The two pieces of code will be disjoint,
locking-wise.

Now it is true that during unshare we need to take the p2m lock to change
the p2m entry. That''s a very strong reason to make the p2m lock
fine-grained. But I need to start somewhere, so I''m breaking up the
global
shr_lock first.
>
>> Now those are all pretty words, but here are the two things I (think I)
>> need to do:
>> - Prior to updating pte''s, we do get_gfn on the page table
page. We
>> should
>> be using get_gfn_unshare. Regardless of this patch. It''s
likely never
>> going to trigger an actual unshare, yet better safe than sorry.
>
> Does memory sharing work on pv domains at all?Not. At. All :)

I can _not_ add the unshare. It''s idempotent to pv.
>
>> - I can wrap uses of page_lock in mm sharing in an "external"
>> order-enforcing construct from mm-locks.h. And use that to scream
>> deadlock
>> between page_lock and p2m_lock.
>>
>> The code that actually uses page_lock()s in the memory sharing code can
>> be
>> found in "[PATCH] x86/mm: Eliminate global lock in mem sharing
code". It
>> already orders locking of individual pages in ascending order.
>
> It should be this patch to make the functions externally visible, not a
> separate one (or at the very minimum the two ought to be in the same
> series, back to back). Which is not to say that I''m fully
convinced this
> is a good idea.Whichever you prefer. I''m of the mind of making shorter patches when
possible, that do one thing, to ease readability. But I can collapse the
two.

Thanks
Andres>
> Jan
>
>

Tim Deegan

2011-Dec-09 14:57 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

At 18:54 -0800 on 08 Dec (1323370449), Andres Lagar-Cavilla
wrote:> > At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote:
> >> This is necessary for a new consumer of page_lock/unlock to follow
in
> >> the series.
> >>
> >> Signed-off-by: Andres Lagar-Cavilla
<andres@lagarcavilla.org>
> >
> > Nak, I''m afraid.
> >
> > These were OK as local functions but if they''re going to be
made
> > generally visible, they need clear comments describing what this
> > locking protects and what the discipline is for avoiding deadlocks.
> 
> How about something along the lines of
> "page_lock() is used for two purposes: pte serialization, and memory
> sharing. All users of page lock for pte serialization live in mm.c, use it
> to lock a page table page during pte updates, do not take other locks
> within the critical section delimited by page_lock/unlock, and perform no
> nesting. All users of page lock for memory sharing live in
> mm/mem_sharing.c. For memory sharing, nesting may happen when sharing (and
> locking) two pages -- deadlock is avoided by locking pages in increasing
> order. Memory sharing may take the p2m_lock within a page_lock/unlock
> critical section. These two users (pte serialization and memory sharing)
> should never collide, as long as page table pages are properly unshared
> prior to updating."
Thanks.  Extracting from that and from Keir''s response: 

It serves both as a spinlock and to exclude any to the page-type of the
page in question (by causing the get_page_type() functions to spin).

What it currently protects is all modifications to pages that have
pagetable type.  This serialises PV PTE validations, both against other
validations of the same PTE and against concurrent changes of the
enclosing page''s type.

Your planned use is to protect updates to the page-sharing state
associated with a page.  Can you be clear about what exactly is protected?

The proposed locking discipline is that
- page locks must be taken in increasing order of MFN, yes?  
- and that you must always take page locks before the p2m lock?

Is that about right?

Tim.

Andres Lagar-Cavilla

2011-Dec-09 14:59 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

> At 18:54 -0800 on 08 Dec (1323370449), Andres Lagar-Cavilla wrote:
>> > At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote:
>> >> This is necessary for a new consumer of page_lock/unlock to
follow in
>> >> the series.
>> >>
>> >> Signed-off-by: Andres Lagar-Cavilla
<andres@lagarcavilla.org>
>> >
>> > Nak, I''m afraid.
>> >
>> > These were OK as local functions but if they''re going to
be made
>> > generally visible, they need clear comments describing what this
>> > locking protects and what the discipline is for avoiding
deadlocks.
>>
>> How about something along the lines of
>> "page_lock() is used for two purposes: pte serialization, and
memory
>> sharing. All users of page lock for pte serialization live in mm.c, use
>> it
>> to lock a page table page during pte updates, do not take other locks
>> within the critical section delimited by page_lock/unlock, and perform
>> no
>> nesting. All users of page lock for memory sharing live in
>> mm/mem_sharing.c. For memory sharing, nesting may happen when sharing
>> (and
>> locking) two pages -- deadlock is avoided by locking pages in
increasing
>> order. Memory sharing may take the p2m_lock within a page_lock/unlock
>> critical section. These two users (pte serialization and memory
sharing)
>> should never collide, as long as page table pages are properly unshared
>> prior to updating."
>
> Thanks.  Extracting from that and from Keir''s response:
>
> It serves both as a spinlock and to exclude any to the page-type of the
> page in question (by causing the get_page_type() functions to spin).
>
> What it currently protects is all modifications to pages that have
> pagetable type.  This serialises PV PTE validations, both against other
> validations of the same PTE and against concurrent changes of the
> enclosing page''s type.
>
> Your planned use is to protect updates to the page-sharing state
> associated with a page.  Can you be clear about what exactly is protected?"Hanging" from a shared page, there is a list of all the
gfn''s it backs.
These are (gfn,domain) tuples. So every time we share or unshare, we need
protected access to that list.
>
> The proposed locking discipline is that
> - page locks must be taken in increasing order of MFN, yes?
> - and that you must always take page locks before the p2m lock?
Yes and yes>
> Is that about right?Indeed

Thanks
Andres>
> Tim.
>

Keir Fraser

2011-Dec-09 15:02 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

On 09/12/2011 14:57, "Tim Deegan" <tim@xen.org> wrote:
> At 18:54 -0800 on 08 Dec (1323370449), Andres Lagar-Cavilla wrote:
>>> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote:
>>>> This is necessary for a new consumer of page_lock/unlock to
follow in
>>>> the series.
>>>> 
>>>> Signed-off-by: Andres Lagar-Cavilla
<andres@lagarcavilla.org>
>>> 
>>> Nak, I''m afraid.
>>> 
>>> These were OK as local functions but if they''re going to
be made
>>> generally visible, they need clear comments describing what this
>>> locking protects and what the discipline is for avoiding deadlocks.
>> 
>> How about something along the lines of
>> "page_lock() is used for two purposes: pte serialization, and
memory
>> sharing. All users of page lock for pte serialization live in mm.c, use
it
>> to lock a page table page during pte updates, do not take other locks
>> within the critical section delimited by page_lock/unlock, and perform
no
>> nesting. All users of page lock for memory sharing live in
>> mm/mem_sharing.c. For memory sharing, nesting may happen when sharing
(and
>> locking) two pages -- deadlock is avoided by locking pages in
increasing
>> order. Memory sharing may take the p2m_lock within a page_lock/unlock
>> critical section. These two users (pte serialization and memory
sharing)
>> should never collide, as long as page table pages are properly unshared
>> prior to updating."
> 
> Thanks.  Extracting from that and from Keir''s response:
> 
> It serves both as a spinlock and to exclude any to the page-type of the
> page in question (by causing the get_page_type() functions to spin).
It does not cause the get_page_type() functions to spin. An attempt to get
another reference to the current type will succeed; an attempt to change the
type will immediately fail. From the p.o.v. of the type-tracking logic,
page_lock() simply takes a reference to the current type.

 -- Keir
> What it currently protects is all modifications to pages that have
> pagetable type.  This serialises PV PTE validations, both against other
> validations of the same PTE and against concurrent changes of the
> enclosing page''s type.
> 
> Your planned use is to protect updates to the page-sharing state
> associated with a page.  Can you be clear about what exactly is protected?
> 
> The proposed locking discipline is that
> - page locks must be taken in increasing order of MFN, yes?
> - and that you must always take page locks before the p2m lock?
> 
> Is that about right?
> 
> Tim.

Jan Beulich

2011-Dec-09 15:06 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

>>> On 09.12.11 at 15:53, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org> wrote:
>>>>> On 09.12.11 at 03:54, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org>
>>>>> wrote:
>>>>  At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla
wrote:
>>>>> This is necessary for a new consumer of page_lock/unlock to
follow in
>>>>> the series.
>>>>>
>>>>> Signed-off-by: Andres Lagar-Cavilla
<andres@lagarcavilla.org>
>>>>
>>>> Nak, I''m afraid.
>>>>
>>>> These were OK as local functions but if they''re going
to be made
>>>> generally visible, they need clear comments describing what
this
>>>> locking protects and what the discipline is for avoiding
deadlocks.
>>>
>>> How about something along the lines of
>>> "page_lock() is used for two purposes: pte serialization, and
memory
>>> sharing. All users of page lock for pte serialization live in mm.c,
use
>>> it
>>> to lock a page table page during pte updates, do not take other
locks
>>> within the critical section delimited by page_lock/unlock, and
perform
>>> no
>>> nesting. All users of page lock for memory sharing live in
>>> mm/mem_sharing.c. For memory sharing, nesting may happen when
sharing
>>> (and
>>> locking) two pages -- deadlock is avoided by locking pages in
increasing
>>> order. Memory sharing may take the p2m_lock within a
page_lock/unlock
>>> critical section. These two users (pte serialization and memory
sharing)
>>> should never collide, as long as page table pages are properly
unshared
>>> prior to updating."
>>
>> This would seem to me like very undesirable lock ordering - a very
>> coarse (per-domain) lock taken inside a very fine grained (per-page)
>> one.
> I''m not sure I follow. Unshare would do its work, and then pte
> serialization would start. The two pieces of code will be disjoint,
> locking-wise.
But your original mail said "Memory sharing may take the p2m_lock
within a page_lock/unlock critical section" - see above. That''s
what
I''m referring to.
> Now it is true that during unshare we need to take the p2m lock to change
> the p2m entry. That''s a very strong reason to make the p2m lock
> fine-grained. But I need to start somewhere, so I''m breaking up
the global
> shr_lock first.
I don''t really think that it''ll be reasonable to split up the
p2m lock.
>>> Now those are all pretty words, but here are the two things I
(think I)
>>> need to do:
>>> - Prior to updating pte''s, we do get_gfn on the page table
page. We
>>> should
>>> be using get_gfn_unshare. Regardless of this patch. It''s
likely never
>>> going to trigger an actual unshare, yet better safe than sorry.
>>
>> Does memory sharing work on pv domains at all?
> Not. At. All :)
> 
> I can _not_ add the unshare. It''s idempotent to pv.
Perhaps I should have clarified why I was asking: The pte handling is
a pv-only thing, and if memory sharing is hvm only, then the two can''t
ever conflict.
>>> - I can wrap uses of page_lock in mm sharing in an
"external"
>>> order-enforcing construct from mm-locks.h. And use that to scream
>>> deadlock
>>> between page_lock and p2m_lock.
>>>
>>> The code that actually uses page_lock()s in the memory sharing code
can
>>> be
>>> found in "[PATCH] x86/mm: Eliminate global lock in mem sharing
code". It
>>> already orders locking of individual pages in ascending order.
>>
>> It should be this patch to make the functions externally visible, not a
>> separate one (or at the very minimum the two ought to be in the same
>> series, back to back). Which is not to say that I''m fully
convinced this
>> is a good idea.
> Whichever you prefer. I''m of the mind of making shorter patches
when
> possible, that do one thing, to ease readability. But I can collapse the
> two.
In quite a few (recent) cases your patches did something where the user
of the change wasn''t obvious at all (in some cases - I tried to point
these
out - there was no user even at the end of a series). While I agree that
shorter patches are easier to review, trivial changes like the one here
should imo really be logically grouped with what requires them.

Jan

Andres Lagar-Cavilla

2011-Dec-09 17:34 UTC

head link

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

>>>> On 09.12.11 at 15:53, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org>
>>>> wrote:
>>>>>> On 09.12.11 at 03:54, "Andres Lagar-Cavilla"
>>>>>> <andres@lagarcavilla.org>
>>>>>> wrote:
>>>>>  At 02:47 -0500 on 08 Dec (1323312447), Andres
Lagar-Cavilla wrote:
>>>>>> This is necessary for a new consumer of
page_lock/unlock to follow
>>>>>> in
>>>>>> the series.
>>>>>>
>>>>>> Signed-off-by: Andres Lagar-Cavilla
<andres@lagarcavilla.org>
>>>>>
>>>>> Nak, I''m afraid.
>>>>>
>>>>> These were OK as local functions but if they''re
going to be made
>>>>> generally visible, they need clear comments describing what
this
>>>>> locking protects and what the discipline is for avoiding
deadlocks.
>>>>
>>>> How about something along the lines of
>>>> "page_lock() is used for two purposes: pte serialization,
and memory
>>>> sharing. All users of page lock for pte serialization live in
mm.c,
>>>> use
>>>> it
>>>> to lock a page table page during pte updates, do not take other
locks
>>>> within the critical section delimited by page_lock/unlock, and
perform
>>>> no
>>>> nesting. All users of page lock for memory sharing live in
>>>> mm/mem_sharing.c. For memory sharing, nesting may happen when
sharing
>>>> (and
>>>> locking) two pages -- deadlock is avoided by locking pages in
>>>> increasing
>>>> order. Memory sharing may take the p2m_lock within a
page_lock/unlock
>>>> critical section. These two users (pte serialization and memory
>>>> sharing)
>>>> should never collide, as long as page table pages are properly
>>>> unshared
>>>> prior to updating."
>>>
>>> This would seem to me like very undesirable lock ordering - a very
>>> coarse (per-domain) lock taken inside a very fine grained
(per-page)
>>> one.
>> I''m not sure I follow. Unshare would do its work, and then pte
>> serialization would start. The two pieces of code will be disjoint,
>> locking-wise.
>
> But your original mail said "Memory sharing may take the p2m_lock
> within a page_lock/unlock critical section" - see above.
That''s what
> I''m referring to.
>
>> Now it is true that during unshare we need to take the p2m lock to
>> change
>> the p2m entry. That''s a very strong reason to make the p2m
lock
>> fine-grained. But I need to start somewhere, so I''m breaking
up the
>> global
>> shr_lock first.
>
> I don''t really think that it''ll be reasonable to split up
the p2m lock.
>
>>>> Now those are all pretty words, but here are the two things I
(think
>>>> I)
>>>> need to do:
>>>> - Prior to updating pte''s, we do get_gfn on the page
table page. We
>>>> should
>>>> be using get_gfn_unshare. Regardless of this patch.
It''s likely never
>>>> going to trigger an actual unshare, yet better safe than sorry.
>>>
>>> Does memory sharing work on pv domains at all?
>> Not. At. All :)
>>
>> I can _not_ add the unshare. It''s idempotent to pv.
>
> Perhaps I should have clarified why I was asking: The pte handling is
> a pv-only thing, and if memory sharing is hvm only, then the two
can''t
> ever conflict.
Grant mapping uses the page_lock. I believe there is work trying to make
pv backends work in hvm?

If so, best to add unshare of the page table page now. Should a
free-wheeling user-space sharer try to share page table pages....

Andres>
>>>> - I can wrap uses of page_lock in mm sharing in an
"external"
>>>> order-enforcing construct from mm-locks.h. And use that to
scream
>>>> deadlock
>>>> between page_lock and p2m_lock.
>>>>
>>>> The code that actually uses page_lock()s in the memory sharing
code
>>>> can
>>>> be
>>>> found in "[PATCH] x86/mm: Eliminate global lock in mem
sharing code".
>>>> It
>>>> already orders locking of individual pages in ascending order.
>>>
>>> It should be this patch to make the functions externally visible,
not a
>>> separate one (or at the very minimum the two ought to be in the
same
>>> series, back to back). Which is not to say that I''m fully
convinced
>>> this
>>> is a good idea.
>> Whichever you prefer. I''m of the mind of making shorter
patches when
>> possible, that do one thing, to ease readability. But I can collapse
the
>> two.
>
> In quite a few (recent) cases your patches did something where the user
> of the change wasn''t obvious at all (in some cases - I tried to
point
> these
> out - there was no user even at the end of a series). While I agree that
> shorter patches are easier to review, trivial changes like the one here
> should imo really be logically grouped with what requires them.
>
> Jan
>

Xen devel - Dec 2011 - [PATCH 00 of 18] Memory sharing overhaul

[PATCH 00 of 18] Memory sharing overhaul

[PATCH 01 of 18] x86/mm: Code style fixes in mem_sharing.c

[PATCH 02 of 18] x86/mm: Making a page sharable sets PGT_validated, but making a page private doesn''t expect it

[PATCH 03 of 18] x86/mm: Eliminate hash table in sharing code as index of shared mfns

[PATCH 04 of 18] x86/mm: Update mem sharing interface to (re)allow sharing of grants

[PATCH 05 of 18] Tools: Do not assume sharing target is dom0 in libxc wrappers

[PATCH 06 of 18] Tools: Update libxc mem sharing interface

[PATCH 07 of 18] Tools: Update memshr tool to use new sharing API

[PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages

[PATCH 09 of 18] x86/mm: Check how many mfns are shared, in addition to how many are saved

[PATCH 10 of 18] Tools: Expose to libxc the total number of shared frames and space saved

[PATCH 11 of 18] Tools: Allow libxl/xl to expose the total number of shared frames and space saved

[PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 01 of 18] x86/mm: Code style fixes in mem_sharing.c

Re: [PATCH 02 of 18] x86/mm: Making a page sharable sets PGT_validated, but making a page private doesn''t expect it

Re: [PATCH 01 of 18] x86/mm: Code style fixes in mem_sharing.c

Re: [PATCH 01 of 18] x86/mm: Code style fixes in mem_sharing.c

Re: [PATCH 03 of 18] x86/mm: Eliminate hash table in sharing code as index of shared mfns

Re: [PATCH 04 of 18] x86/mm: Update mem sharing interface to (re)allow sharing of grants

Re: [PATCH 09 of 18] x86/mm: Check how many mfns are shared, in addition to how many are saved

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 04 of 18] x86/mm: Update mem sharing interface to (re)allow sharing of grants

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 06 of 18] Tools: Update libxc mem sharing interface

Re: [PATCH 07 of 18] Tools: Update memshr tool to use new sharing API

Re: [PATCH 05 of 18] Tools: Do not assume sharing target is dom0 in libxc wrappers

Re: [PATCH 11 of 18] Tools: Allow libxl/xl to expose the total number of shared frames and space saved

Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages

Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages

Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages

Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable

Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable