This patch series brings about a set of changes to the sharing support in the hypervisor and tools. Specifically: - Granular scalable locking: lock mfn''s individually when sharing/unsharing them. Do not rely on a global lock. Use RCU to maintain a list of all shared frames for audit purposes. - Refreshed stats: they now work (tm), and are accesible via libxc and console. - Basic libxl/xl support for sharing. - New sharing command "add to physmap", to directly populate a new domain with shared frames. - Toolstack and memshr kept in sync. A consequence of our modifications is a change to the sharing API. While we refresh the only user of the sharing API in the tree, we want to make sure we are not affecting any other users of the sharing API. Patches 9, 10 11 are tools patches. Patches 1 to 8 are x86/mm. These hypervisor bits apply on top of patch "x86/mm: Improve ring management for memory events. Do not lose guest events" currently traversing the chain of reviewing. Patch 12 is a simple testing tool for showcasing new functionality and facilitating testing. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scannell <adin@scannell.ca> xen/arch/x86/mm/mem_sharing.c | 550 +++++++++++++++++----------------- xen/include/asm-x86/mem_sharing.h | 19 +- xen/include/asm-x86/mm.h | 11 +- xen/include/public/domctl.h | 3 + xen/arch/x86/mm/mem_sharing.c | 57 +++- xen/include/public/domctl.h | 9 + xen/arch/x86/mm.c | 74 +---- xen/arch/x86/mm/mem_sharing.c | 257 +++++++++++++++- xen/arch/x86/mm/mm-locks.h | 15 +- xen/include/asm-x86/mm.h | 27 +- xen/arch/x86/mm/mem_sharing.c | 91 +++-- xen/arch/x86/mm/mm-locks.h | 18 +- xen/include/asm-x86/mm.h | 3 +- xen/arch/x86/mm.c | 6 - xen/arch/x86/mm/mem_sharing.c | 24 + xen/arch/x86/x86_64/compat/mm.c | 6 + xen/arch/x86/x86_64/mm.c | 7 + xen/include/asm-x86/mem_sharing.h | 1 + xen/include/public/memory.h | 1 + xen/arch/x86/mm/mem_sharing.c | 104 ++++++ xen/include/public/domctl.h | 3 +- xen/arch/ia64/xen/mm.c | 5 + xen/arch/x86/mm.c | 14 + xen/common/keyhandler.c | 7 +- xen/include/xen/mm.h | 3 + xen/arch/x86/mm/mem_sharing.c | 94 ++--- xen/arch/x86/mm/mm-locks.h | 18 - xen/include/asm-x86/mem_sharing.h | 3 +- tools/blktap2/drivers/tapdisk-vbd.c | 6 +- tools/blktap2/drivers/tapdisk.h | 2 +- tools/libxc/xc_memshr.c | 63 +++- tools/libxc/xenctrl.h | 19 +- tools/memshr/bidir-daemon.c | 34 +- tools/memshr/bidir-hash.h | 12 +- tools/memshr/interface.c | 30 +- tools/memshr/memshr.h | 11 +- tools/libxc/xc_memshr.c | 10 + tools/libxc/xenctrl.h | 17 + docs/man/xl.pod.1 | 13 + tools/libxl/libxl.c | 2 + tools/libxl/libxl_types.idl | 2 + tools/libxl/xl.h | 1 + tools/libxl/xl_cmdimpl.c | 66 ++++ tools/libxl/xl_cmdtable.c | 5 + tools/tests/mem-sharing/Makefile | 26 + tools/tests/mem-sharing/memshrtool.c | 165 ++++++++++ 46 files changed, 1371 insertions(+), 543 deletions(-)
Andres Lagar-Cavilla
2012-Jan-16 02:56 UTC
[PATCH 01 of 12] x86/mm: Eliminate hash table in sharing code as index of shared mfns
xen/arch/x86/mm/mem_sharing.c | 550 ++++++++++++++++++------------------- xen/include/asm-x86/mem_sharing.h | 19 +- xen/include/asm-x86/mm.h | 11 +- xen/include/public/domctl.h | 3 + 4 files changed, 302 insertions(+), 281 deletions(-) The hashtable mechanism used by the sharing code is a bit odd. It seems unnecessary and adds complexity. Instead, we replace this by storing a list head directly in the page info for the case when the page is shared. This does not add any extra space to the page_info and serves to remove significant complexity from sharing. Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 302a58874eb1 -r 2a4112172e44 xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -3,6 +3,7 @@ * * Memory sharing support. * + * Copyright (c) 2011 GridCentric, Inc. (Adin Scannell & Andres Lagar-Cavilla) * Copyright (c) 2009 Citrix Systems, Inc. (Grzegorz Milos) * * This program is free software; you can redistribute it and/or modify @@ -34,15 +35,30 @@ #include "mm-locks.h" -/* Auditing of memory sharing code? */ -#define MEM_SHARING_AUDIT 0 - #if MEM_SHARING_AUDIT -static void mem_sharing_audit(void); +static int mem_sharing_audit(void); #define MEM_SHARING_DEBUG(_f, _a...) \ debugtrace_printk("mem_sharing_debug: %s(): " _f, __func__, ##_a) +static struct list_head shr_audit_list; + +static inline void audit_add_list(struct page_info *page) +{ + INIT_LIST_HEAD(&page->shared_info->entry); + list_add(&page->shared_info->entry, &shr_audit_list); +} + +static inline void audit_del_list(struct page_info *page) +{ + list_del(&page->shared_info->entry); +} #else -#define mem_sharing_audit() do {} while(0) +static inline int mem_sharing_audit(void) +{ + return 0; +} + +#define audit_add_list(p) ((void)0) +#define audit_del_list(p) ((void)0) #endif /* MEM_SHARING_AUDIT */ #define mem_sharing_enabled(d) \ @@ -58,17 +74,6 @@ static void mem_sharing_audit(void); static shr_handle_t next_handle = 1; static atomic_t nr_saved_mfns = ATOMIC_INIT(0); -typedef struct shr_hash_entry -{ - shr_handle_t handle; - mfn_t mfn; - struct shr_hash_entry *next; - struct list_head gfns; -} shr_hash_entry_t; - -#define SHR_HASH_LENGTH 1000 -static shr_hash_entry_t *shr_hash[SHR_HASH_LENGTH]; - typedef struct gfn_info { unsigned long gfn; @@ -89,166 +94,153 @@ static inline gfn_info_t *gfn_get_info(s return list_entry(list->next, gfn_info_t, list); } -static void __init mem_sharing_hash_init(void) +static inline gfn_info_t *mem_sharing_gfn_alloc(struct page_info *page, + struct domain *d, + unsigned long gfn) { - int i; + gfn_info_t *gfn_info = xmalloc(gfn_info_t); - mm_lock_init(&shr_lock); - for(i=0; i<SHR_HASH_LENGTH; i++) - shr_hash[i] = NULL; + if ( gfn_info == NULL ) + return NULL; + + gfn_info->gfn = gfn; + gfn_info->domain = d->domain_id; + INIT_LIST_HEAD(&gfn_info->list); + list_add(&gfn_info->list, &page->shared_info->gfns); + + /* Increment our number of shared pges. */ + atomic_inc(&d->shr_pages); + + return gfn_info; } -static shr_hash_entry_t *mem_sharing_hash_alloc(void) +static inline void mem_sharing_gfn_destroy(struct domain *d, + gfn_info_t *gfn_info) { - return xmalloc(shr_hash_entry_t); + /* Decrement the number of pages. */ + atomic_dec(&d->shr_pages); + + /* Free the gfn_info structure. */ + list_del(&gfn_info->list); + xfree(gfn_info); } -static void mem_sharing_hash_destroy(shr_hash_entry_t *e) +static struct page_info* mem_sharing_lookup(unsigned long mfn) { - xfree(e); -} - -static gfn_info_t *mem_sharing_gfn_alloc(void) -{ - return xmalloc(gfn_info_t); -} - -static void mem_sharing_gfn_destroy(gfn_info_t *gfn, int was_shared) -{ - /* Decrement the number of pages, if the gfn was shared before */ - if ( was_shared ) + if ( mfn_valid(_mfn(mfn)) ) { - struct domain *d = get_domain_by_id(gfn->domain); - /* Domain may have been destroyed by now * - * (if we are called from p2m_teardown) */ - if ( d ) + struct page_info* page = mfn_to_page(_mfn(mfn)); + if ( page_get_owner(page) == dom_cow ) { - atomic_dec(&d->shr_pages); - put_domain(d); + ASSERT(page->u.inuse.type_info & PGT_type_mask); + ASSERT(get_gpfn_from_mfn(mfn) == SHARED_M2P_ENTRY); + return page; } } - xfree(gfn); -} - -static shr_hash_entry_t* mem_sharing_hash_lookup(shr_handle_t handle) -{ - shr_hash_entry_t *e; - - e = shr_hash[handle % SHR_HASH_LENGTH]; - while(e != NULL) - { - if(e->handle == handle) - return e; - e = e->next; - } return NULL; } -static shr_hash_entry_t* mem_sharing_hash_insert(shr_handle_t handle, mfn_t mfn) +#if MEM_SHARING_AUDIT +static int mem_sharing_audit(void) { - shr_hash_entry_t *e, **ee; - - e = mem_sharing_hash_alloc(); - if(e == NULL) return NULL; - e->handle = handle; - e->mfn = mfn; - ee = &shr_hash[handle % SHR_HASH_LENGTH]; - e->next = *ee; - *ee = e; - return e; -} - -static void mem_sharing_hash_delete(shr_handle_t handle) -{ - shr_hash_entry_t **pprev, *e; - - pprev = &shr_hash[handle % SHR_HASH_LENGTH]; - e = *pprev; - while(e != NULL) - { - if(e->handle == handle) - { - *pprev = e->next; - mem_sharing_hash_destroy(e); - return; - } - pprev = &e->next; - e = e->next; - } - printk("Could not find shr entry for handle %"PRIx64"\n", handle); - BUG(); -} - -#if MEM_SHARING_AUDIT -static void mem_sharing_audit(void) -{ - shr_hash_entry_t *e; - struct list_head *le; - gfn_info_t *g; - int bucket; - struct page_info *pg; + int errors = 0; + struct list_head *ae; ASSERT(shr_locked_by_me()); - for(bucket=0; bucket < SHR_HASH_LENGTH; bucket++) + list_for_each(ae, &shr_audit_list) { - e = shr_hash[bucket]; - /* Loop over all shr_hash_entries */ - while(e != NULL) + struct page_sharing_info *shared_info; + unsigned long nr_gfns = 0; + struct page_info *pg; + struct list_head *le; + mfn_t mfn; + + shared_info = list_entry(ae, struct page_sharing_info, entry); + pg = shared_info->pg; + mfn = page_to_mfn(pg); + + /* Check if the MFN has correct type, owner and handle. */ + if ( !(pg->u.inuse.type_info & PGT_shared_page) ) { - int nr_gfns=0; + MEM_SHARING_DEBUG("mfn %lx in audit list, but not PGT_shared_page (%d)!\n", + mfn_x(mfn), pg->u.inuse.type_info & PGT_type_mask); + errors++; + continue; + } - /* Check if the MFN has correct type, owner and handle */ - pg = mfn_to_page(e->mfn); - if((pg->u.inuse.type_info & PGT_type_mask) != PGT_shared_page) - MEM_SHARING_DEBUG("mfn %lx not shared, but in the hash!\n", - mfn_x(e->mfn)); - if(page_get_owner(pg) != dom_cow) - MEM_SHARING_DEBUG("mfn %lx shared, but wrong owner (%d)!\n", - mfn_x(e->mfn), - page_get_owner(pg)->domain_id); - if(e->handle != pg->shr_handle) - MEM_SHARING_DEBUG("mfn %lx shared, but wrong handle " - "(%ld != %ld)!\n", - mfn_x(e->mfn), pg->shr_handle, e->handle); - /* Check if all GFNs map to the MFN, and the p2m types */ - list_for_each(le, &e->gfns) + /* Check the page owner. */ + if ( page_get_owner(pg) != dom_cow ) + { + MEM_SHARING_DEBUG("mfn %lx shared, but wrong owner (%d)!\n", + mfn_x(mfn), page_get_owner(pg)->domain_id); + errors++; + } + + /* Check the m2p entry */ + if ( get_gpfn_from_mfn(mfn_x(mfn)) != SHARED_M2P_ENTRY ) + { + MEM_SHARING_DEBUG("mfn %lx shared, but wrong m2p entry (%lx)!\n", + mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn))); + errors++; + } + + /* Check we have a list */ + if ( (!pg->shared_info) || (list_empty(&pg->shared_info->gfns)) ) + { + MEM_SHARING_DEBUG("mfn %lx shared, but empty gfn list!\n", + mfn_x(mfn)); + errors++; + continue; + } + + /* Check if all GFNs map to the MFN, and the p2m types */ + list_for_each(le, &pg->shared_info->gfns) + { + struct domain *d; + p2m_type_t t; + mfn_t o_mfn; + gfn_info_t *g; + + g = list_entry(le, gfn_info_t, list); + d = get_domain_by_id(g->domain); + if ( d == NULL ) { - struct domain *d; - p2m_type_t t; - mfn_t mfn; - - g = list_entry(le, struct gfn_info, list); - d = get_domain_by_id(g->domain); - if(d == NULL) - { - MEM_SHARING_DEBUG("Unknow dom: %d, for PFN=%lx, MFN=%lx\n", - g->domain, g->gfn, mfn_x(e->mfn)); - continue; - } - mfn = get_gfn_unlocked(d, g->gfn, &t); - if(mfn_x(mfn) != mfn_x(e->mfn)) - MEM_SHARING_DEBUG("Incorrect P2M for d=%d, PFN=%lx." - "Expecting MFN=%ld, got %ld\n", - g->domain, g->gfn, mfn_x(e->mfn), - mfn_x(mfn)); - if(t != p2m_ram_shared) - MEM_SHARING_DEBUG("Incorrect P2M type for d=%d, PFN=%lx." - "Expecting t=%d, got %d\n", - g->domain, g->gfn, mfn_x(e->mfn), - p2m_ram_shared, t); - put_domain(d); - nr_gfns++; - } - if(nr_gfns != (pg->u.inuse.type_info & PGT_count_mask)) - MEM_SHARING_DEBUG("Mismatched counts for MFN=%lx." - "nr_gfns in hash %d, in type_info %d\n", - mfn_x(e->mfn), nr_gfns, - (pg->u.inuse.type_info & PGT_count_mask)); - e = e->next; + MEM_SHARING_DEBUG("Unknown dom: %d, for PFN=%lx, MFN=%lx\n", + g->domain, g->gfn, mfn); + errors++; + continue; + } + o_mfn = get_gfn_query_unlocked(d, g->gfn, &t); + if ( mfn_x(o_mfn) != mfn_x(mfn) ) + { + MEM_SHARING_DEBUG("Incorrect P2M for d=%d, PFN=%lx." + "Expecting MFN=%ld, got %ld\n", + g->domain, g->gfn, mfn, mfn_x(o_mfn)); + errors++; + } + if ( t != p2m_ram_shared ) + { + MEM_SHARING_DEBUG("Incorrect P2M type for d=%d, PFN=%lx." + "Expecting t=%d, got %d\n", + g->domain, g->gfn, mfn, p2m_ram_shared, t); + errors++; + } + put_domain(d); + nr_gfns++; + } + if ( nr_gfns != (pg->u.inuse.type_info & PGT_count_mask) ) + { + MEM_SHARING_DEBUG("Mismatched counts for MFN=%lx." + "nr_gfns in hash %d, in type_info %d\n", + mfn_x(mfn), nr_gfns, + (pg->u.inuse.type_info & PGT_count_mask)); + errors++; } } + + return errors; } #endif @@ -383,36 +375,6 @@ static int mem_sharing_gref_to_gfn(struc return 0; } -/* Account for a GFN being shared/unshared. - * When sharing this function needs to be called _before_ gfn lists are merged - * together, but _after_ gfn is removed from the list when unsharing. - */ -static int mem_sharing_gfn_account(struct gfn_info *gfn, int sharing) -{ - struct domain *d; - - /* A) When sharing: - * if the gfn being shared is in > 1 long list, its already been - * accounted for - * B) When unsharing: - * if the list is longer than > 1, we don''t have to account yet. - */ - if(list_has_one_entry(&gfn->list)) - { - d = get_domain_by_id(gfn->domain); - BUG_ON(!d); - if(sharing) - atomic_inc(&d->shr_pages); - else - atomic_dec(&d->shr_pages); - put_domain(d); - - return 1; - } - mem_sharing_audit(); - - return 0; -} int mem_sharing_debug_gref(struct domain *d, grant_ref_t ref) { @@ -450,8 +412,6 @@ int mem_sharing_nominate_page(struct dom mfn_t mfn; struct page_info *page; int ret; - shr_handle_t handle; - shr_hash_entry_t *hash_entry; struct gfn_info *gfn_info; *phandle = 0UL; @@ -467,7 +427,7 @@ int mem_sharing_nominate_page(struct dom /* Return the handle if the page is already shared */ page = mfn_to_page(mfn); if ( p2m_is_shared(p2mt) ) { - *phandle = page->shr_handle; + *phandle = page->shared_info->handle; ret = 0; goto out; } @@ -481,16 +441,26 @@ int mem_sharing_nominate_page(struct dom if ( ret ) goto out; - /* Create the handle */ + /* Initialize the shared state */ ret = -ENOMEM; - handle = next_handle++; - if((hash_entry = mem_sharing_hash_insert(handle, mfn)) == NULL) + if ( (page->shared_info = + xmalloc(struct page_sharing_info)) == NULL ) { + BUG_ON(page_make_private(d, page) != 0); goto out; } - if((gfn_info = mem_sharing_gfn_alloc()) == NULL) + page->shared_info->pg = page; + INIT_LIST_HEAD(&page->shared_info->gfns); + + /* Create the handle */ + page->shared_info->handle = next_handle++; + + /* Create the local gfn info */ + if ( (gfn_info = mem_sharing_gfn_alloc(page, d, gfn)) == NULL ) { - mem_sharing_hash_destroy(hash_entry); + xfree(page->shared_info); + page->shared_info = NULL; + BUG_ON(page_make_private(d, page) != 0); goto out; } @@ -501,23 +471,19 @@ int mem_sharing_nominate_page(struct dom * it a few lines above. * The mfn needs to revert back to rw type. This should never fail, * since no-one knew that the mfn was temporarily sharable */ + mem_sharing_gfn_destroy(d, gfn_info); + xfree(page->shared_info); + page->shared_info = NULL; + /* NOTE: We haven''t yet added this to the audit list. */ BUG_ON(page_make_private(d, page) != 0); - mem_sharing_hash_destroy(hash_entry); - mem_sharing_gfn_destroy(gfn_info, 0); goto out; } /* Update m2p entry to SHARED_M2P_ENTRY */ set_gpfn_from_mfn(mfn_x(mfn), SHARED_M2P_ENTRY); - INIT_LIST_HEAD(&hash_entry->gfns); - INIT_LIST_HEAD(&gfn_info->list); - list_add(&gfn_info->list, &hash_entry->gfns); - gfn_info->gfn = gfn; - gfn_info->domain = d->domain_id; - page->shr_handle = handle; - *phandle = handle; - + *phandle = page->shared_info->handle; + audit_add_list(page); ret = 0; out: @@ -526,54 +492,82 @@ out: return ret; } -int mem_sharing_share_pages(shr_handle_t sh, shr_handle_t ch) +int mem_sharing_share_pages(struct domain *sd, unsigned long sgfn, shr_handle_t sh, + struct domain *cd, unsigned long cgfn, shr_handle_t ch) { - shr_hash_entry_t *se, *ce; struct page_info *spage, *cpage; struct list_head *le, *te; - struct gfn_info *gfn; + gfn_info_t *gfn; struct domain *d; - int ret; + int ret = -EINVAL; + mfn_t smfn, cmfn; + p2m_type_t smfn_type, cmfn_type; shr_lock(); + /* XXX if sd == cd handle potential deadlock by ordering + * the get_ and put_gfn''s */ + smfn = get_gfn(sd, sgfn, &smfn_type); + cmfn = get_gfn(cd, cgfn, &cmfn_type); + ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; - se = mem_sharing_hash_lookup(sh); - if(se == NULL) goto err_out; + spage = mem_sharing_lookup(mfn_x(smfn)); + if ( spage == NULL ) + goto err_out; + ASSERT(smfn_type == p2m_ram_shared); ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID; - ce = mem_sharing_hash_lookup(ch); - if(ce == NULL) goto err_out; - spage = mfn_to_page(se->mfn); - cpage = mfn_to_page(ce->mfn); - /* gfn lists always have at least one entry => save to call list_entry */ - mem_sharing_gfn_account(gfn_get_info(&ce->gfns), 1); - mem_sharing_gfn_account(gfn_get_info(&se->gfns), 1); - list_for_each_safe(le, te, &ce->gfns) + cpage = mem_sharing_lookup(mfn_x(cmfn)); + if ( cpage == NULL ) + goto err_out; + ASSERT(cmfn_type == p2m_ram_shared); + + /* Check that the handles match */ + if ( spage->shared_info->handle != sh ) { - gfn = list_entry(le, struct gfn_info, list); - /* Get the source page and type, this should never fail - * because we are under shr lock, and got non-null se */ + ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; + goto err_out; + } + if ( cpage->shared_info->handle != ch ) + { + ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID; + goto err_out; + } + + /* Merge the lists together */ + list_for_each_safe(le, te, &cpage->shared_info->gfns) + { + gfn = list_entry(le, gfn_info_t, list); + /* Get the source page and type, this should never fail: + * we are under shr lock, and got a successful lookup */ BUG_ON(!get_page_and_type(spage, dom_cow, PGT_shared_page)); - /* Move the gfn_info from ce list to se list */ + /* Move the gfn_info from client list to source list */ list_del(&gfn->list); + list_add(&gfn->list, &spage->shared_info->gfns); + put_page_and_type(cpage); d = get_domain_by_id(gfn->domain); BUG_ON(!d); - BUG_ON(set_shared_p2m_entry(d, gfn->gfn, se->mfn) == 0); + BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn) == 0); put_domain(d); - list_add(&gfn->list, &se->gfns); - put_page_and_type(cpage); - } - ASSERT(list_empty(&ce->gfns)); - mem_sharing_hash_delete(ch); - atomic_inc(&nr_saved_mfns); + } + ASSERT(list_empty(&cpage->shared_info->gfns)); + + /* Clear the rest of the shared state */ + audit_del_list(cpage); + xfree(cpage->shared_info); + cpage->shared_info = NULL; + /* Free the client page */ if(test_and_clear_bit(_PGC_allocated, &cpage->count_info)) put_page(cpage); + + /* We managed to free a domain page. */ + atomic_inc(&nr_saved_mfns); ret = 0; err_out: + put_gfn(cd, cgfn); + put_gfn(sd, sgfn); shr_unlock(); - return ret; } @@ -585,13 +579,9 @@ int mem_sharing_unshare_page(struct doma mfn_t mfn; struct page_info *page, *old_page; void *s, *t; - int ret, last_gfn; - shr_hash_entry_t *hash_entry; - struct gfn_info *gfn_info = NULL; - shr_handle_t handle; + int last_gfn; + gfn_info_t *gfn_info = NULL; struct list_head *le; - - /* Remove the gfn_info from the list */ /* This is one of the reasons why we can''t enforce ordering * between shr_lock and p2m fine-grained locks in mm-lock. @@ -607,56 +597,62 @@ int mem_sharing_unshare_page(struct doma return 0; } - page = mfn_to_page(mfn); - handle = page->shr_handle; - - hash_entry = mem_sharing_hash_lookup(handle); - list_for_each(le, &hash_entry->gfns) + page = mem_sharing_lookup(mfn_x(mfn)); + if ( page == NULL ) { - gfn_info = list_entry(le, struct gfn_info, list); + gdprintk(XENLOG_ERR, "Domain p2m is shared, but page is not: " + "%lx\n", gfn); + BUG(); + } + + list_for_each(le, &page->shared_info->gfns) + { + gfn_info = list_entry(le, gfn_info_t, list); if ( (gfn_info->gfn == gfn) && (gfn_info->domain == d->domain_id) ) goto gfn_found; } gdprintk(XENLOG_ERR, "Could not find gfn_info for shared gfn: " "%lx\n", gfn); BUG(); -gfn_found: - /* Delete gfn_info from the list, but hold on to it, until we''ve allocated - * memory to make a copy */ - list_del(&gfn_info->list); - last_gfn = list_empty(&hash_entry->gfns); +gfn_found: + /* Do the accounting first. If anything fails below, we have bigger + * bigger fish to fry. First, remove the gfn from the list. */ + last_gfn = list_has_one_entry(&page->shared_info->gfns); + mem_sharing_gfn_destroy(d, gfn_info); + if ( last_gfn ) + { + /* Clean up shared state */ + audit_del_list(page); + xfree(page->shared_info); + page->shared_info = NULL; + } + else + atomic_dec(&nr_saved_mfns); /* If the GFN is getting destroyed drop the references to MFN * (possibly freeing the page), and exit early */ if ( flags & MEM_SHARING_DESTROY_GFN ) { - mem_sharing_gfn_destroy(gfn_info, !last_gfn); - if(last_gfn) - mem_sharing_hash_delete(handle); - else - /* Even though we don''t allocate a private page, we have to account - * for the MFN that originally backed this PFN. */ - atomic_dec(&nr_saved_mfns); put_gfn(d, gfn); shr_unlock(); put_page_and_type(page); - if(last_gfn && - test_and_clear_bit(_PGC_allocated, &page->count_info)) + if ( last_gfn && + test_and_clear_bit(_PGC_allocated, &page->count_info) ) put_page(page); + return 0; } - ret = page_make_private(d, page); - BUG_ON(last_gfn & ret); - if(ret == 0) goto private_page_found; - + if ( last_gfn ) + { + BUG_ON(page_make_private(d, page) != 0); + goto private_page_found; + } + old_page = page; page = alloc_domheap_page(d, 0); - if(!page) + if ( !page ) { - /* We''ve failed to obtain memory for private page. Need to re-add the - * gfn_info to relevant list */ - list_add(&gfn_info->list, &hash_entry->gfns); put_gfn(d, gfn); mem_sharing_notify_helper(d, gfn); shr_unlock(); @@ -669,30 +665,18 @@ gfn_found: unmap_domain_page(s); unmap_domain_page(t); - /* NOTE: set_shared_p2m_entry will switch the underlying mfn. If - * we do get_page withing get_gfn, the correct sequence here - * should be - get_page(page); - put_page(old_page); - * so that the ref to the old page is dropped, and a ref to - * the new page is obtained to later be dropped in put_gfn */ BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page)) == 0); put_page_and_type(old_page); private_page_found: - /* We''ve got a private page, we can commit the gfn destruction */ - mem_sharing_gfn_destroy(gfn_info, !last_gfn); - if(last_gfn) - mem_sharing_hash_delete(handle); - else - atomic_dec(&nr_saved_mfns); - if ( p2m_change_type(d, gfn, p2m_ram_shared, p2m_ram_rw) != p2m_ram_shared ) { - printk("Could not change p2m type.\n"); + gdprintk(XENLOG_ERR, "Could not change p2m type d %hu gfn %lx.\n", + d->domain_id, gfn); BUG(); } + /* Update m2p entry */ set_gpfn_from_mfn(mfn_x(page_to_mfn(page)), gfn); @@ -749,9 +733,18 @@ int mem_sharing_domctl(struct domain *d, case XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE: { - shr_handle_t sh = mec->u.share.source_handle; - shr_handle_t ch = mec->u.share.client_handle; - rc = mem_sharing_share_pages(sh, ch); + unsigned long sgfn = mec->u.share.source_gfn; + shr_handle_t sh = mec->u.share.source_handle; + struct domain *cd = get_domain_by_id(mec->u.share.client_domain); + if ( cd ) + { + unsigned long cgfn = mec->u.share.client_gfn; + shr_handle_t ch = mec->u.share.client_handle; + rc = mem_sharing_share_pages(d, sgfn, sh, cd, cgfn, ch); + put_domain(cd); + } + else + return -EEXIST; } break; @@ -799,6 +792,9 @@ int mem_sharing_domctl(struct domain *d, void __init mem_sharing_init(void) { printk("Initing memory sharing.\n"); - mem_sharing_hash_init(); + mm_lock_init(&shr_lock); +#if MEM_SHARING_AUDIT + INIT_LIST_HEAD(&shr_audit_list); +#endif } diff -r 302a58874eb1 -r 2a4112172e44 xen/include/asm-x86/mem_sharing.h --- a/xen/include/asm-x86/mem_sharing.h +++ b/xen/include/asm-x86/mem_sharing.h @@ -22,13 +22,28 @@ #ifndef __MEM_SHARING_H__ #define __MEM_SHARING_H__ +#include <public/domctl.h> + +/* Auditing of memory sharing code? */ +#define MEM_SHARING_AUDIT 0 + +typedef uint64_t shr_handle_t; + +struct page_sharing_info +{ + struct page_info *pg; /* Back pointer to the page. */ + shr_handle_t handle; /* Globally unique version / handle. */ +#if MEM_SHARING_AUDIT + struct list_head entry; /* List of all shared pages (entry). */ +#endif + struct list_head gfns; /* List of domains and gfns for this page (head). */ +}; + #ifdef __x86_64__ #define sharing_supported(_d) \ (is_hvm_domain(_d) && paging_mode_hap(_d)) -typedef uint64_t shr_handle_t; - unsigned int mem_sharing_get_nr_saved_mfns(void); int mem_sharing_nominate_page(struct domain *d, unsigned long gfn, diff -r 302a58874eb1 -r 2a4112172e44 xen/include/asm-x86/mm.h --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -31,6 +31,8 @@ struct page_list_entry __pdx_t next, prev; }; +struct page_sharing_info; + struct page_info { union { @@ -49,8 +51,13 @@ struct page_info /* For non-pinnable single-page shadows, a higher entry that points * at us. */ paddr_t up; - /* For shared/sharable pages the sharing handle */ - uint64_t shr_handle; + /* For shared/sharable pages, we use a doubly-linked list + * of all the {pfn,domain} pairs that map this page. We also include + * an opaque handle, which is effectively a version, so that clients + * of sharing share the version they expect to. + * This list is allocated and freed when a page is shared/unshared. + */ + struct page_sharing_info *shared_info; }; /* Reference count and various PGC_xxx flags and fields. */ diff -r 302a58874eb1 -r 2a4112172e44 xen/include/public/domctl.h --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -789,7 +789,10 @@ struct xen_domctl_mem_sharing_op { uint64_aligned_t handle; /* OUT: the handle */ } nominate; struct mem_sharing_op_share { /* OP_SHARE */ + uint64_aligned_t source_gfn; /* IN: the gfn of the source page */ uint64_aligned_t source_handle; /* IN: handle to the source page */ + domid_t client_domain; /* IN: the client domain id */ + uint64_aligned_t client_gfn; /* IN: the client gfn */ uint64_aligned_t client_handle; /* IN: handle to the client page */ } share; struct mem_sharing_op_debug { /* OP_DEBUG_xxx */
Andres Lagar-Cavilla
2012-Jan-16 02:56 UTC
[PATCH 02 of 12] x86/mm: Update mem sharing interface to (re)allow sharing of grants
xen/arch/x86/mm/mem_sharing.c | 57 ++++++++++++++++++++++++++++++++++++------ xen/include/public/domctl.h | 9 ++++++ 2 files changed, 57 insertions(+), 9 deletions(-) Previosuly, the mem sharing code would return an opaque handle to index shared pages (and nominees) in its global hash table. By removing the hash table, the new interfaces requires a gfn and a version. However, when sharing grants, the caller provides a grant ref and a version. Update interface to handle this case. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scannell <adin@scannell.ca> diff -r 2a4112172e44 -r e0d8333e4725 xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -733,18 +733,57 @@ int mem_sharing_domctl(struct domain *d, case XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE: { - unsigned long sgfn = mec->u.share.source_gfn; - shr_handle_t sh = mec->u.share.source_handle; - struct domain *cd = get_domain_by_id(mec->u.share.client_domain); - if ( cd ) + unsigned long sgfn, cgfn; + struct domain *cd; + shr_handle_t sh, ch; + + if ( !mem_sharing_enabled(d) ) + return -EINVAL; + + cd = get_domain_by_id(mec->u.share.client_domain); + if ( !cd ) + return -ESRCH; + + if ( !mem_sharing_enabled(cd) ) { - unsigned long cgfn = mec->u.share.client_gfn; - shr_handle_t ch = mec->u.share.client_handle; - rc = mem_sharing_share_pages(d, sgfn, sh, cd, cgfn, ch); put_domain(cd); + return -EINVAL; } - else - return -EEXIST; + + if ( XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.source_gfn) ) + { + grant_ref_t gref = (grant_ref_t) + (XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF( + mec->u.share.source_gfn)); + if ( mem_sharing_gref_to_gfn(d, gref, &sgfn) < 0 ) + { + put_domain(cd); + return -EINVAL; + } + } else { + sgfn = mec->u.share.source_gfn; + } + + if ( XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.client_gfn) ) + { + grant_ref_t gref = (grant_ref_t) + (XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF( + mec->u.share.client_gfn)); + if ( mem_sharing_gref_to_gfn(cd, gref, &cgfn) < 0 ) + { + put_domain(cd); + return -EINVAL; + } + } else { + cgfn = mec->u.share.client_gfn; + } + + sh = mec->u.share.source_handle; + ch = mec->u.share.client_handle; + + rc = mem_sharing_share_pages(d, sgfn, sh, cd, cgfn, ch); + + put_domain(cd); } break; diff -r 2a4112172e44 -r e0d8333e4725 xen/include/public/domctl.h --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -775,6 +775,15 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_mem_e #define XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID (-10) #define XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID (-9) +#define XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG (1ULL << 62) + +#define XEN_DOMCTL_MEM_SHARING_FIELD_MAKE_GREF(field, val) \ + (field) = (XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG | val) +#define XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(field) \ + ((field) & XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG) +#define XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF(field) \ + ((field) & (~XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG)) + struct xen_domctl_mem_sharing_op { uint8_t op; /* XEN_DOMCTL_MEM_EVENT_OP_* */
Andres Lagar-Cavilla
2012-Jan-16 02:56 UTC
[PATCH 03 of 12] x86/mm: Add per-page locking for memory sharing, when audits are disabled
xen/arch/x86/mm.c | 74 +----------- xen/arch/x86/mm/mem_sharing.c | 257 ++++++++++++++++++++++++++++++++++++++--- xen/arch/x86/mm/mm-locks.h | 15 +- xen/include/asm-x86/mm.h | 27 +++- 4 files changed, 275 insertions(+), 98 deletions(-) With the removal of the hash table, all that is needed now is locking of individual shared pages, as new (gfn,domain) pairs are removed or added from the list of mappings. We recycle PGT_locked and use it to lock individual pages. We ensure deadlock is averted by locking pages in increasing order. The global lock remains for the benefit of the auditing code, and is thus enabled only as a compile-time option. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scannell <adin@scannell.ca> diff -r e0d8333e4725 -r 11916fe20dd2 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1719,7 +1719,7 @@ static int free_l4_table(struct page_inf #define free_l4_table(page, preemptible) (-EINVAL) #endif -static int page_lock(struct page_info *page) +int page_lock(struct page_info *page) { unsigned long x, nx; @@ -1736,7 +1736,7 @@ static int page_lock(struct page_info *p return 1; } -static void page_unlock(struct page_info *page) +void page_unlock(struct page_info *page) { unsigned long x, nx, y = page->u.inuse.type_info; @@ -4295,76 +4295,6 @@ int steal_page( return -1; } -int page_make_sharable(struct domain *d, - struct page_info *page, - int expected_refcnt) -{ - spin_lock(&d->page_alloc_lock); - - /* Change page type and count atomically */ - if ( !get_page_and_type(page, d, PGT_shared_page) ) - { - spin_unlock(&d->page_alloc_lock); - return -EINVAL; - } - - /* Check it wasn''t already sharable and undo if it was */ - if ( (page->u.inuse.type_info & PGT_count_mask) != 1 ) - { - put_page_and_type(page); - spin_unlock(&d->page_alloc_lock); - return -EEXIST; - } - - /* Check if the ref count is 2. The first from PGC_allocated, and - * the second from get_page_and_type at the top of this function */ - if ( page->count_info != (PGC_allocated | (2 + expected_refcnt)) ) - { - /* Return type count back to zero */ - put_page_and_type(page); - spin_unlock(&d->page_alloc_lock); - return -E2BIG; - } - - page_set_owner(page, dom_cow); - d->tot_pages--; - page_list_del(page, &d->page_list); - spin_unlock(&d->page_alloc_lock); - return 0; -} - -int page_make_private(struct domain *d, struct page_info *page) -{ - if ( !get_page(page, dom_cow) ) - return -EINVAL; - - spin_lock(&d->page_alloc_lock); - - /* We can only change the type if count is one */ - if ( (page->u.inuse.type_info & (PGT_type_mask | PGT_count_mask)) - != (PGT_shared_page | 1) ) - { - put_page(page); - spin_unlock(&d->page_alloc_lock); - return -EEXIST; - } - - /* Drop the final typecount */ - put_page_and_type(page); - - /* Change the owner */ - ASSERT(page_get_owner(page) == dom_cow); - page_set_owner(page, d); - - d->tot_pages++; - page_list_add_tail(page, &d->page_list); - spin_unlock(&d->page_alloc_lock); - - put_page(page); - - return 0; -} - static int __do_update_va_mapping( unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner) { diff -r e0d8333e4725 -r 11916fe20dd2 xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -35,10 +35,21 @@ #include "mm-locks.h" +static shr_handle_t next_handle = 1; + #if MEM_SHARING_AUDIT + +static mm_lock_t shr_lock; + +#define shr_lock() _shr_lock() +#define shr_unlock() _shr_unlock() +#define shr_locked_by_me() _shr_locked_by_me() + static int mem_sharing_audit(void); + #define MEM_SHARING_DEBUG(_f, _a...) \ debugtrace_printk("mem_sharing_debug: %s(): " _f, __func__, ##_a) + static struct list_head shr_audit_list; static inline void audit_add_list(struct page_info *page) @@ -51,7 +62,21 @@ static inline void audit_del_list(struct { list_del(&page->shared_info->entry); } + +static inline int mem_sharing_page_lock(struct page_info *p) +{ + return 1; +} +#define mem_sharing_page_unlock(p) ((void)0) + +#define get_next_handle() next_handle++; #else + +#define shr_lock() ((void)0) +#define shr_unlock() ((void)0) +/* Only used inside audit code */ +//#define shr_locked_by_me() ((void)0) + static inline int mem_sharing_audit(void) { return 0; @@ -59,6 +84,34 @@ static inline int mem_sharing_audit(void #define audit_add_list(p) ((void)0) #define audit_del_list(p) ((void)0) + +static inline int mem_sharing_page_lock(struct page_info *pg) +{ + int rc; + rc = page_lock(pg); + if ( rc ) + { + preempt_disable(); + } + return rc; +} + +static inline void mem_sharing_page_unlock(struct page_info *pg) +{ + preempt_enable(); + page_unlock(pg); +} + +static inline shr_handle_t get_next_handle(void) +{ + /* Get the next handle get_page style */ + uint64_t x, y = next_handle; + do { + x = y; + } + while ( (y = cmpxchg(&next_handle, x, x + 1)) != x ); + return x + 1; +} #endif /* MEM_SHARING_AUDIT */ #define mem_sharing_enabled(d) \ @@ -71,7 +124,6 @@ static inline int mem_sharing_audit(void #undef page_to_mfn #define page_to_mfn(_pg) _mfn(__page_to_mfn(_pg)) -static shr_handle_t next_handle = 1; static atomic_t nr_saved_mfns = ATOMIC_INIT(0); typedef struct gfn_info @@ -81,8 +133,6 @@ typedef struct gfn_info struct list_head list; } gfn_info_t; -static mm_lock_t shr_lock; - /* Returns true if list has only one entry. O(1) complexity. */ static inline int list_has_one_entry(struct list_head *head) { @@ -403,6 +453,113 @@ int mem_sharing_debug_gref(struct domain return mem_sharing_debug_gfn(d, gfn); } +/* Functions that change a page''s type and ownership */ +static int page_make_sharable(struct domain *d, + struct page_info *page, + int expected_refcnt) +{ + spin_lock(&d->page_alloc_lock); + + /* Change page type and count atomically */ + if ( !get_page_and_type(page, d, PGT_shared_page) ) + { + spin_unlock(&d->page_alloc_lock); + return -EINVAL; + } + + /* Check it wasn''t already sharable and undo if it was */ + if ( (page->u.inuse.type_info & PGT_count_mask) != 1 ) + { + put_page_and_type(page); + spin_unlock(&d->page_alloc_lock); + return -EEXIST; + } + + /* Check if the ref count is 2. The first from PGC_allocated, and + * the second from get_page_and_type at the top of this function */ + if ( page->count_info != (PGC_allocated | (2 + expected_refcnt)) ) + { + /* Return type count back to zero */ + put_page_and_type(page); + spin_unlock(&d->page_alloc_lock); + return -E2BIG; + } + + page_set_owner(page, dom_cow); + d->tot_pages--; + page_list_del(page, &d->page_list); + spin_unlock(&d->page_alloc_lock); + return 0; +} + +static int page_make_private(struct domain *d, struct page_info *page) +{ + unsigned long expected_type; + + if ( !get_page(page, dom_cow) ) + return -EINVAL; + + spin_lock(&d->page_alloc_lock); + + /* We can only change the type if count is one */ + /* If we are locking pages individually, then we need to drop + * the lock here, while the page is typed. We cannot risk the + * race of page_unlock and then put_page_type. */ +#if MEM_SHARING_AUDIT + expected_type = (PGT_shared_page | PGT_validated | 1); +#else + expected_type = (PGT_shared_page | PGT_validated | PGT_locked | 2); +#endif + if ( page->u.inuse.type_info != expected_type ) + { + put_page(page); + spin_unlock(&d->page_alloc_lock); + return -EEXIST; + } + + /* Drop the final typecount */ + put_page_and_type(page); + +#ifndef MEM_SHARING_AUDIT + /* Now that we''ve dropped the type, we can unlock */ + mem_sharing_page_unlock(page); +#endif + + /* Change the owner */ + ASSERT(page_get_owner(page) == dom_cow); + page_set_owner(page, d); + + d->tot_pages++; + page_list_add_tail(page, &d->page_list); + spin_unlock(&d->page_alloc_lock); + + put_page(page); + + return 0; +} + +static inline struct page_info *__grab_shared_page(mfn_t mfn) +{ + struct page_info *pg = NULL; + + if ( !mfn_valid(mfn) ) + return NULL; + pg = mfn_to_page(mfn); + + /* If the page is not validated we can''t lock it, and if it''s + * not validated it''s obviously not shared. */ + if ( !mem_sharing_page_lock(pg) ) + return NULL; + + if ( mem_sharing_lookup(mfn_x(mfn)) == NULL ) + { + mem_sharing_page_unlock(pg); + return NULL; + } + + return pg; +} + int mem_sharing_nominate_page(struct domain *d, unsigned long gfn, int expected_refcnt, @@ -410,7 +567,7 @@ int mem_sharing_nominate_page(struct dom { p2m_type_t p2mt; mfn_t mfn; - struct page_info *page; + struct page_info *page = NULL; /* gcc... */ int ret; struct gfn_info *gfn_info; @@ -425,10 +582,17 @@ int mem_sharing_nominate_page(struct dom goto out; /* Return the handle if the page is already shared */ - page = mfn_to_page(mfn); if ( p2m_is_shared(p2mt) ) { - *phandle = page->shared_info->handle; + struct page_info *pg = __grab_shared_page(mfn); + if ( !pg ) + { + gdprintk(XENLOG_ERR, "Shared p2m entry gfn %lx, but could not " + "grab page %lx dom %d\n", gfn, mfn_x(mfn), d->domain_id); + BUG(); + } + *phandle = pg->shared_info->handle; ret = 0; + mem_sharing_page_unlock(pg); goto out; } @@ -437,15 +601,24 @@ int mem_sharing_nominate_page(struct dom goto out; /* Try to convert the mfn to the sharable type */ + page = mfn_to_page(mfn); ret = page_make_sharable(d, page, expected_refcnt); if ( ret ) goto out; + /* Now that the page is validated, we can lock it. There is no + * race because we''re holding the p2m entry, so no one else + * could be nominating this gfn */ + ret = -ENOENT; + if ( !mem_sharing_page_lock(page) ) + goto out; + /* Initialize the shared state */ ret = -ENOMEM; if ( (page->shared_info = xmalloc(struct page_sharing_info)) == NULL ) { + /* Making a page private atomically unlocks it */ BUG_ON(page_make_private(d, page) != 0); goto out; } @@ -453,7 +626,7 @@ int mem_sharing_nominate_page(struct dom INIT_LIST_HEAD(&page->shared_info->gfns); /* Create the handle */ - page->shared_info->handle = next_handle++; + page->shared_info->handle = get_next_handle(); /* Create the local gfn info */ if ( (gfn_info = mem_sharing_gfn_alloc(page, d, gfn)) == NULL ) @@ -484,6 +657,7 @@ int mem_sharing_nominate_page(struct dom *phandle = page->shared_info->handle; audit_add_list(page); + mem_sharing_page_unlock(page); ret = 0; out: @@ -495,7 +669,7 @@ out: int mem_sharing_share_pages(struct domain *sd, unsigned long sgfn, shr_handle_t sh, struct domain *cd, unsigned long cgfn, shr_handle_t ch) { - struct page_info *spage, *cpage; + struct page_info *spage, *cpage, *firstpg, *secondpg; struct list_head *le, *te; gfn_info_t *gfn; struct domain *d; @@ -510,26 +684,63 @@ int mem_sharing_share_pages(struct domai smfn = get_gfn(sd, sgfn, &smfn_type); cmfn = get_gfn(cd, cgfn, &cmfn_type); - ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; - spage = mem_sharing_lookup(mfn_x(smfn)); - if ( spage == NULL ) + /* This tricky business is to avoid two callers deadlocking if + * grabbing pages in opposite client/source order */ + if( mfn_x(smfn) == mfn_x(cmfn) ) + { + /* The pages are already the same. We could return some + * kind of error here, but no matter how you look at it, + * the pages are already ''shared''. It possibly represents + * a big problem somewhere else, but as far as sharing is + * concerned: great success! */ + ret = 0; goto err_out; + } + else if ( mfn_x(smfn) < mfn_x(cmfn) ) + { + ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; + spage = firstpg = __grab_shared_page(smfn); + if ( spage == NULL ) + goto err_out; + + ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID; + cpage = secondpg = __grab_shared_page(cmfn); + if ( cpage == NULL ) + { + mem_sharing_page_unlock(spage); + goto err_out; + } + } else { + ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID; + cpage = firstpg = __grab_shared_page(cmfn); + if ( cpage == NULL ) + goto err_out; + + ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; + spage = secondpg = __grab_shared_page(smfn); + if ( spage == NULL ) + { + mem_sharing_page_unlock(cpage); + goto err_out; + } + } + ASSERT(smfn_type == p2m_ram_shared); - ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID; - cpage = mem_sharing_lookup(mfn_x(cmfn)); - if ( cpage == NULL ) - goto err_out; ASSERT(cmfn_type == p2m_ram_shared); /* Check that the handles match */ if ( spage->shared_info->handle != sh ) { ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; + mem_sharing_page_unlock(secondpg); + mem_sharing_page_unlock(firstpg); goto err_out; } if ( cpage->shared_info->handle != ch ) { ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID; + mem_sharing_page_unlock(secondpg); + mem_sharing_page_unlock(firstpg); goto err_out; } @@ -556,6 +767,9 @@ int mem_sharing_share_pages(struct domai xfree(cpage->shared_info); cpage->shared_info = NULL; + mem_sharing_page_unlock(secondpg); + mem_sharing_page_unlock(firstpg); + /* Free the client page */ if(test_and_clear_bit(_PGC_allocated, &cpage->count_info)) put_page(cpage); @@ -597,7 +811,7 @@ int mem_sharing_unshare_page(struct doma return 0; } - page = mem_sharing_lookup(mfn_x(mfn)); + page = __grab_shared_page(mfn); if ( page == NULL ) { gdprintk(XENLOG_ERR, "Domain p2m is shared, but page is not: " @@ -633,18 +847,20 @@ gfn_found: * (possibly freeing the page), and exit early */ if ( flags & MEM_SHARING_DESTROY_GFN ) { - put_gfn(d, gfn); - shr_unlock(); put_page_and_type(page); + mem_sharing_page_unlock(page); if ( last_gfn && test_and_clear_bit(_PGC_allocated, &page->count_info) ) put_page(page); + put_gfn(d, gfn); + shr_unlock(); return 0; } if ( last_gfn ) { + /* Making a page private atomically unlocks it */ BUG_ON(page_make_private(d, page) != 0); goto private_page_found; } @@ -653,6 +869,7 @@ gfn_found: page = alloc_domheap_page(d, 0); if ( !page ) { + mem_sharing_page_unlock(old_page); put_gfn(d, gfn); mem_sharing_notify_helper(d, gfn); shr_unlock(); @@ -666,6 +883,7 @@ gfn_found: unmap_domain_page(t); BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page)) == 0); + mem_sharing_page_unlock(old_page); put_page_and_type(old_page); private_page_found: @@ -683,6 +901,7 @@ private_page_found: /* Now that the gfn<->mfn map is properly established, * marking dirty is feasible */ paging_mark_dirty(d, mfn_x(page_to_mfn(page))); + /* We do not need to unlock a private page */ put_gfn(d, gfn); shr_unlock(); return 0; @@ -831,8 +1050,8 @@ int mem_sharing_domctl(struct domain *d, void __init mem_sharing_init(void) { printk("Initing memory sharing.\n"); +#if MEM_SHARING_AUDIT mm_lock_init(&shr_lock); -#if MEM_SHARING_AUDIT INIT_LIST_HEAD(&shr_audit_list); #endif } diff -r e0d8333e4725 -r 11916fe20dd2 xen/arch/x86/mm/mm-locks.h --- a/xen/arch/x86/mm/mm-locks.h +++ b/xen/arch/x86/mm/mm-locks.h @@ -26,6 +26,8 @@ #ifndef _MM_LOCKS_H #define _MM_LOCKS_H +#include <asm/mem_sharing.h> + /* Per-CPU variable for enforcing the lock ordering */ DECLARE_PER_CPU(int, mm_lock_level); #define __get_lock_level() (this_cpu(mm_lock_level)) @@ -141,15 +143,22 @@ static inline void mm_enforce_order_unlo * * ************************************************************************/ +#if MEM_SHARING_AUDIT /* Page-sharing lock (global) * * A single global lock that protects the memory-sharing code''s * hash tables. */ declare_mm_lock(shr) -#define shr_lock() mm_lock(shr, &shr_lock) -#define shr_unlock() mm_unlock(&shr_lock) -#define shr_locked_by_me() mm_locked_by_me(&shr_lock) +#define _shr_lock() mm_lock(shr, &shr_lock) +#define _shr_unlock() mm_unlock(&shr_lock) +#define _shr_locked_by_me() mm_locked_by_me(&shr_lock) + +#else + +/* We use an efficient per-page lock when AUDIT is not enabled. */ + +#endif /* MEM_SHARING_AUDIT */ /* Nested P2M lock (per-domain) * diff -r e0d8333e4725 -r 11916fe20dd2 xen/include/asm-x86/mm.h --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -337,6 +337,29 @@ int is_iomem_page(unsigned long mfn); void clear_superpage_mark(struct page_info *page); +/* Per page locks: + * page_lock() is used for two purposes: pte serialization, and memory sharing. + * + * All users of page lock for pte serialization live in mm.c, use it + * to lock a page table page during pte updates, do not take other locks within + * the critical section delimited by page_lock/unlock, and perform no + * nesting. + * + * All users of page lock for memory sharing live in mm/mem_sharing.c. Page_lock + * is used in memory sharing to protect addition (share) and removal (unshare) + * of (gfn,domain) tupples to a list of gfn''s that the shared page is currently + * backing. Nesting may happen when sharing (and locking) two pages -- deadlock + * is avoided by locking pages in increasing order. + * Memory sharing may take the p2m_lock within a page_lock/unlock + * critical section. + * + * These two users (pte serialization and memory sharing) do not collide, since + * sharing is only supported for hvm guests, which do not perform pv pte updates. + * + */ +int page_lock(struct page_info *page); +void page_unlock(struct page_info *page); + struct domain *page_get_owner_and_reference(struct page_info *page); void put_page(struct page_info *page); int get_page(struct page_info *page, struct domain *domain); @@ -588,10 +611,6 @@ int steal_page( struct domain *d, struct page_info *page, unsigned int memflags); int donate_page( struct domain *d, struct page_info *page, unsigned int memflags); -int page_make_sharable(struct domain *d, - struct page_info *page, - int expected_refcnt); -int page_make_private(struct domain *d, struct page_info *page); int map_ldt_shadow_page(unsigned int);
Andres Lagar-Cavilla
2012-Jan-16 02:56 UTC
[PATCH 04 of 12] x86/mm: Enforce lock ordering for sharing page locks
xen/arch/x86/mm/mem_sharing.c | 91 ++++++++++++++++++++++++++---------------- xen/arch/x86/mm/mm-locks.h | 18 ++++++++- xen/include/asm-x86/mm.h | 3 +- 3 files changed, 76 insertions(+), 36 deletions(-) Use the ordering constructs in mm-locks.h to enforce an order for the p2m and page locks in the sharing code. Applies to either the global sharing lock (in audit mode) or the per page locks. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scanneell <adin@scannell.ca> diff -r 11916fe20dd2 -r c906c616d5ac xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -37,6 +37,14 @@ static shr_handle_t next_handle = 1; +typedef struct pg_lock_data { + int mm_unlock_level; + unsigned short recurse_count; +} pg_lock_data_t; + +#define DECLARE_PG_LOCK_DATA(name) \ + pg_lock_data_t name = { 0, 0 }; + #if MEM_SHARING_AUDIT static mm_lock_t shr_lock; @@ -63,11 +71,12 @@ static inline void audit_del_list(struct list_del(&page->shared_info->entry); } -static inline int mem_sharing_page_lock(struct page_info *p) +static inline int mem_sharing_page_lock(struct page_info *p, + pg_lock_data_t *l) { return 1; } -#define mem_sharing_page_unlock(p) ((void)0) +#define mem_sharing_page_unlock(p, l) ((void)0) #define get_next_handle() next_handle++; #else @@ -85,19 +94,26 @@ static inline int mem_sharing_audit(void #define audit_add_list(p) ((void)0) #define audit_del_list(p) ((void)0) -static inline int mem_sharing_page_lock(struct page_info *pg) +static inline int mem_sharing_page_lock(struct page_info *pg, + pg_lock_data_t *pld) { int rc; + page_sharing_mm_pre_lock(); rc = page_lock(pg); if ( rc ) { preempt_disable(); + page_sharing_mm_post_lock(&pld->mm_unlock_level, + &pld->recurse_count); } return rc; } -static inline void mem_sharing_page_unlock(struct page_info *pg) +static inline void mem_sharing_page_unlock(struct page_info *pg, + pg_lock_data_t *pld) { + page_sharing_mm_unlock(pld->mm_unlock_level, + &pld->recurse_count); preempt_enable(); page_unlock(pg); } @@ -492,7 +508,8 @@ static int page_make_sharable(struct dom return 0; } -static int page_make_private(struct domain *d, struct page_info *page) +static int page_make_private(struct domain *d, struct page_info *page, + pg_lock_data_t *pld) { unsigned long expected_type; @@ -520,9 +537,11 @@ static int page_make_private(struct doma /* Drop the final typecount */ put_page_and_type(page); -#ifndef MEM_SHARING_AUDIT +#if MEM_SHARING_AUDIT + (void)pld; +#else /* Now that we''ve dropped the type, we can unlock */ - mem_sharing_page_unlock(page); + mem_sharing_page_unlock(page, pld); #endif /* Change the owner */ @@ -538,7 +557,8 @@ static int page_make_private(struct doma return 0; } -static inline struct page_info *__grab_shared_page(mfn_t mfn) +static inline struct page_info *__grab_shared_page(mfn_t mfn, + pg_lock_data_t *pld) { struct page_info *pg = NULL; @@ -548,12 +568,12 @@ static inline struct page_info *__grab_s /* If the page is not validated we can''t lock it, and if it''s * not validated it''s obviously not shared. */ - if ( !mem_sharing_page_lock(pg) ) + if ( !mem_sharing_page_lock(pg, pld) ) return NULL; if ( mem_sharing_lookup(mfn_x(mfn)) == NULL ) { - mem_sharing_page_unlock(pg); + mem_sharing_page_unlock(pg, pld); return NULL; } @@ -570,6 +590,7 @@ int mem_sharing_nominate_page(struct dom struct page_info *page = NULL; /* gcc... */ int ret; struct gfn_info *gfn_info; + DECLARE_PG_LOCK_DATA(pld); *phandle = 0UL; @@ -583,7 +604,7 @@ int mem_sharing_nominate_page(struct dom /* Return the handle if the page is already shared */ if ( p2m_is_shared(p2mt) ) { - struct page_info *pg = __grab_shared_page(mfn); + struct page_info *pg = __grab_shared_page(mfn, &pld); if ( !pg ) { gdprintk(XENLOG_ERR, "Shared p2m entry gfn %lx, but could not " @@ -592,7 +613,7 @@ int mem_sharing_nominate_page(struct dom } *phandle = pg->shared_info->handle; ret = 0; - mem_sharing_page_unlock(pg); + mem_sharing_page_unlock(pg, &pld); goto out; } @@ -610,7 +631,7 @@ int mem_sharing_nominate_page(struct dom * race because we''re holding the p2m entry, so no one else * could be nominating this gfn */ ret = -ENOENT; - if ( !mem_sharing_page_lock(page) ) + if ( !mem_sharing_page_lock(page, &pld) ) goto out; /* Initialize the shared state */ @@ -619,7 +640,7 @@ int mem_sharing_nominate_page(struct dom xmalloc(struct page_sharing_info)) == NULL ) { /* Making a page private atomically unlocks it */ - BUG_ON(page_make_private(d, page) != 0); + BUG_ON(page_make_private(d, page, &pld) != 0); goto out; } page->shared_info->pg = page; @@ -633,7 +654,7 @@ int mem_sharing_nominate_page(struct dom { xfree(page->shared_info); page->shared_info = NULL; - BUG_ON(page_make_private(d, page) != 0); + BUG_ON(page_make_private(d, page, &pld) != 0); goto out; } @@ -648,7 +669,7 @@ int mem_sharing_nominate_page(struct dom xfree(page->shared_info); page->shared_info = NULL; /* NOTE: We haven''t yet added this to the audit list. */ - BUG_ON(page_make_private(d, page) != 0); + BUG_ON(page_make_private(d, page, &pld) != 0); goto out; } @@ -657,7 +678,7 @@ int mem_sharing_nominate_page(struct dom *phandle = page->shared_info->handle; audit_add_list(page); - mem_sharing_page_unlock(page); + mem_sharing_page_unlock(page, &pld); ret = 0; out: @@ -676,6 +697,7 @@ int mem_sharing_share_pages(struct domai int ret = -EINVAL; mfn_t smfn, cmfn; p2m_type_t smfn_type, cmfn_type; + DECLARE_PG_LOCK_DATA(pld); shr_lock(); @@ -699,28 +721,28 @@ int mem_sharing_share_pages(struct domai else if ( mfn_x(smfn) < mfn_x(cmfn) ) { ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; - spage = firstpg = __grab_shared_page(smfn); + spage = firstpg = __grab_shared_page(smfn, &pld); if ( spage == NULL ) goto err_out; ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID; - cpage = secondpg = __grab_shared_page(cmfn); + cpage = secondpg = __grab_shared_page(cmfn, &pld); if ( cpage == NULL ) { - mem_sharing_page_unlock(spage); + mem_sharing_page_unlock(spage, &pld); goto err_out; } } else { ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID; - cpage = firstpg = __grab_shared_page(cmfn); + cpage = firstpg = __grab_shared_page(cmfn, &pld); if ( cpage == NULL ) goto err_out; ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; - spage = secondpg = __grab_shared_page(smfn); + spage = secondpg = __grab_shared_page(smfn, &pld); if ( spage == NULL ) { - mem_sharing_page_unlock(cpage); + mem_sharing_page_unlock(cpage, &pld); goto err_out; } } @@ -732,15 +754,15 @@ int mem_sharing_share_pages(struct domai if ( spage->shared_info->handle != sh ) { ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; - mem_sharing_page_unlock(secondpg); - mem_sharing_page_unlock(firstpg); + mem_sharing_page_unlock(secondpg, &pld); + mem_sharing_page_unlock(firstpg, &pld); goto err_out; } if ( cpage->shared_info->handle != ch ) { ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID; - mem_sharing_page_unlock(secondpg); - mem_sharing_page_unlock(firstpg); + mem_sharing_page_unlock(secondpg, &pld); + mem_sharing_page_unlock(firstpg, &pld); goto err_out; } @@ -767,8 +789,8 @@ int mem_sharing_share_pages(struct domai xfree(cpage->shared_info); cpage->shared_info = NULL; - mem_sharing_page_unlock(secondpg); - mem_sharing_page_unlock(firstpg); + mem_sharing_page_unlock(secondpg, &pld); + mem_sharing_page_unlock(firstpg, &pld); /* Free the client page */ if(test_and_clear_bit(_PGC_allocated, &cpage->count_info)) @@ -796,6 +818,7 @@ int mem_sharing_unshare_page(struct doma int last_gfn; gfn_info_t *gfn_info = NULL; struct list_head *le; + DECLARE_PG_LOCK_DATA(pld); /* This is one of the reasons why we can''t enforce ordering * between shr_lock and p2m fine-grained locks in mm-lock. @@ -811,7 +834,7 @@ int mem_sharing_unshare_page(struct doma return 0; } - page = __grab_shared_page(mfn); + page = __grab_shared_page(mfn, &pld); if ( page == NULL ) { gdprintk(XENLOG_ERR, "Domain p2m is shared, but page is not: " @@ -848,7 +871,7 @@ gfn_found: if ( flags & MEM_SHARING_DESTROY_GFN ) { put_page_and_type(page); - mem_sharing_page_unlock(page); + mem_sharing_page_unlock(page, &pld); if ( last_gfn && test_and_clear_bit(_PGC_allocated, &page->count_info) ) put_page(page); @@ -861,7 +884,7 @@ gfn_found: if ( last_gfn ) { /* Making a page private atomically unlocks it */ - BUG_ON(page_make_private(d, page) != 0); + BUG_ON(page_make_private(d, page, &pld) != 0); goto private_page_found; } @@ -869,7 +892,7 @@ gfn_found: page = alloc_domheap_page(d, 0); if ( !page ) { - mem_sharing_page_unlock(old_page); + mem_sharing_page_unlock(old_page, &pld); put_gfn(d, gfn); mem_sharing_notify_helper(d, gfn); shr_unlock(); @@ -883,7 +906,7 @@ gfn_found: unmap_domain_page(t); BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page)) == 0); - mem_sharing_page_unlock(old_page); + mem_sharing_page_unlock(old_page, &pld); put_page_and_type(old_page); private_page_found: diff -r 11916fe20dd2 -r c906c616d5ac xen/arch/x86/mm/mm-locks.h --- a/xen/arch/x86/mm/mm-locks.h +++ b/xen/arch/x86/mm/mm-locks.h @@ -156,7 +156,23 @@ declare_mm_lock(shr) #else -/* We use an efficient per-page lock when AUDIT is not enabled. */ +/* Sharing per page lock + * + * This is an external lock, not represented by an mm_lock_t. The memory + * sharing lock uses it to protect addition and removal of (gfn,domain) + * tuples to a shared page. We enforce order here against the p2m lock, + * which is taken after the page_lock to change the gfn''s p2m entry. + * + * Note that in sharing audit mode, we use the global page lock above, + * instead. + * + * The lock is recursive because during share we lock two pages. */ + +declare_mm_order_constraint(per_page_sharing) +#define page_sharing_mm_pre_lock() mm_enforce_order_lock_pre_per_page_sharing() +#define page_sharing_mm_post_lock(l, r) \ + mm_enforce_order_lock_post_per_page_sharing((l), (r)) +#define page_sharing_mm_unlock(l, r) mm_enforce_order_unlock((l), (r)) #endif /* MEM_SHARING_AUDIT */ diff -r 11916fe20dd2 -r c906c616d5ac xen/include/asm-x86/mm.h --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -351,7 +351,8 @@ void clear_superpage_mark(struct page_in * backing. Nesting may happen when sharing (and locking) two pages -- deadlock * is avoided by locking pages in increasing order. * Memory sharing may take the p2m_lock within a page_lock/unlock - * critical section. + * critical section. We enforce ordering between page_lock and p2m_lock using an + * mm-locks.h construct. * * These two users (pte serialization and memory sharing) do not collide, since * sharing is only supported for hvm guests, which do not perform pv pte updates.
Andres Lagar-Cavilla
2012-Jan-16 02:56 UTC
[PATCH 05 of 12] x86/mm: Check how many mfns are shared, in addition to how many are saved
xen/arch/x86/mm.c | 6 ------ xen/arch/x86/mm/mem_sharing.c | 24 ++++++++++++++++++++++++ xen/arch/x86/x86_64/compat/mm.c | 6 ++++++ xen/arch/x86/x86_64/mm.c | 7 +++++++ xen/include/asm-x86/mem_sharing.h | 1 + xen/include/public/memory.h | 1 + 6 files changed, 39 insertions(+), 6 deletions(-) This patch also moves the existing sharing-related memory op to the correct location, and adds logic to the audit() method that uses the new information. This patch only provides the Xen implementation of the domctls. Signed-off-by: Andres Lagar-Cavilla <andres@scannell.ca> Signed-off-by: Adin Scannell <adin@scannell.ca> Acked-by: Tim Deegan <tim@xen.org> diff -r c906c616d5ac -r b0c70c156920 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -119,7 +119,6 @@ #include <xen/trace.h> #include <asm/setup.h> #include <asm/fixmap.h> -#include <asm/mem_sharing.h> /* * Mapping of first 2 or 4 megabytes of memory. This is mapped with 4kB @@ -5024,11 +5023,6 @@ long arch_memory_op(int op, XEN_GUEST_HA return rc; } -#ifdef __x86_64__ - case XENMEM_get_sharing_freed_pages: - return mem_sharing_get_nr_saved_mfns(); -#endif - default: return subarch_memory_op(op, arg); } diff -r c906c616d5ac -r b0c70c156920 xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -141,6 +141,7 @@ static inline shr_handle_t get_next_hand #define page_to_mfn(_pg) _mfn(__page_to_mfn(_pg)) static atomic_t nr_saved_mfns = ATOMIC_INIT(0); +static atomic_t nr_shared_mfns = ATOMIC_INIT(0); typedef struct gfn_info { @@ -211,9 +212,12 @@ static struct page_info* mem_sharing_loo static int mem_sharing_audit(void) { int errors = 0; + unsigned long count_expected; + unsigned long count_found = 0; struct list_head *ae; ASSERT(shr_locked_by_me()); + count_expected = atomic_read(&nr_shared_mfns); list_for_each(ae, &shr_audit_list) { @@ -261,6 +265,9 @@ static int mem_sharing_audit(void) continue; } + /* We''ve found a page that is shared */ + count_found++; + /* Check if all GFNs map to the MFN, and the p2m types */ list_for_each(le, &pg->shared_info->gfns) { @@ -306,6 +313,13 @@ static int mem_sharing_audit(void) } } + if ( count_found != count_expected ) + { + MEM_SHARING_DEBUG("Expected %ld shared mfns, found %ld.", + count_expected, count_found); + errors++; + } + return errors; } #endif @@ -347,6 +361,11 @@ unsigned int mem_sharing_get_nr_saved_mf return ((unsigned int)atomic_read(&nr_saved_mfns)); } +unsigned int mem_sharing_get_nr_shared_mfns(void) +{ + return (unsigned int)atomic_read(&nr_shared_mfns); +} + int mem_sharing_sharing_resume(struct domain *d) { mem_event_response_t rsp; @@ -673,6 +692,9 @@ int mem_sharing_nominate_page(struct dom goto out; } + /* Account for this page. */ + atomic_inc(&nr_shared_mfns); + /* Update m2p entry to SHARED_M2P_ENTRY */ set_gpfn_from_mfn(mfn_x(mfn), SHARED_M2P_ENTRY); @@ -797,6 +819,7 @@ int mem_sharing_share_pages(struct domai put_page(cpage); /* We managed to free a domain page. */ + atomic_dec(&nr_shared_mfns); atomic_inc(&nr_saved_mfns); ret = 0; @@ -863,6 +886,7 @@ gfn_found: audit_del_list(page); xfree(page->shared_info); page->shared_info = NULL; + atomic_dec(&nr_shared_mfns); } else atomic_dec(&nr_saved_mfns); diff -r c906c616d5ac -r b0c70c156920 xen/arch/x86/x86_64/compat/mm.c --- a/xen/arch/x86/x86_64/compat/mm.c +++ b/xen/arch/x86/x86_64/compat/mm.c @@ -205,6 +205,12 @@ int compat_arch_memory_op(int op, XEN_GU break; } + case XENMEM_get_sharing_freed_pages: + return mem_sharing_get_nr_saved_mfns(); + + case XENMEM_get_sharing_shared_pages: + return mem_sharing_get_nr_shared_mfns(); + default: rc = -ENOSYS; break; diff -r c906c616d5ac -r b0c70c156920 xen/arch/x86/x86_64/mm.c --- a/xen/arch/x86/x86_64/mm.c +++ b/xen/arch/x86/x86_64/mm.c @@ -34,6 +34,7 @@ #include <asm/msr.h> #include <asm/setup.h> #include <asm/numa.h> +#include <asm/mem_sharing.h> #include <public/memory.h> /* Parameters for PFN/MADDR compression. */ @@ -1093,6 +1094,12 @@ long subarch_memory_op(int op, XEN_GUEST break; + case XENMEM_get_sharing_freed_pages: + return mem_sharing_get_nr_saved_mfns(); + + case XENMEM_get_sharing_shared_pages: + return mem_sharing_get_nr_shared_mfns(); + default: rc = -ENOSYS; break; diff -r c906c616d5ac -r b0c70c156920 xen/include/asm-x86/mem_sharing.h --- a/xen/include/asm-x86/mem_sharing.h +++ b/xen/include/asm-x86/mem_sharing.h @@ -45,6 +45,7 @@ struct page_sharing_info (is_hvm_domain(_d) && paging_mode_hap(_d)) unsigned int mem_sharing_get_nr_saved_mfns(void); +unsigned int mem_sharing_get_nr_shared_mfns(void); int mem_sharing_nominate_page(struct domain *d, unsigned long gfn, int expected_refcnt, diff -r c906c616d5ac -r b0c70c156920 xen/include/public/memory.h --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -294,6 +294,7 @@ typedef struct xen_pod_target xen_pod_ta * The call never fails. */ #define XENMEM_get_sharing_freed_pages 18 +#define XENMEM_get_sharing_shared_pages 19 #endif /* __XEN_PUBLIC_MEMORY_H__ */
Andres Lagar-Cavilla
2012-Jan-16 02:56 UTC
[PATCH 06 of 12] x86/mm: New domctl: add a shared page to the physmap
xen/arch/x86/mm/mem_sharing.c | 104 ++++++++++++++++++++++++++++++++++++++++++ xen/include/public/domctl.h | 3 +- 2 files changed, 106 insertions(+), 1 deletions(-) This domctl is useful to, for example, populate parts of a domain''s physmap with shared frames, directly. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scannell <adin@scannell.ca> diff -r b0c70c156920 -r da6576b82e16 xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -830,6 +830,74 @@ err_out: return ret; } +int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh, + struct domain *cd, unsigned long cgfn) +{ + struct page_info *spage; + int ret = -EINVAL; + mfn_t smfn, cmfn; + p2m_type_t smfn_type, cmfn_type; + struct gfn_info *gfn_info; + struct p2m_domain *p2m = p2m_get_hostp2m(cd); + p2m_access_t a; + DECLARE_PG_LOCK_DATA(pld); + + /* XXX if sd == cd handle potential deadlock by ordering + * the get_ and put_gfn''s */ + smfn = get_gfn_query(sd, sgfn, &smfn_type); + cmfn = get_gfn_type_access(p2m, cgfn, &cmfn_type, &a, p2m_query, NULL); + + /* Get the source shared page, check and lock */ + ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; + spage = __grab_shared_page(smfn, &pld); + if ( spage == NULL ) + goto err_out; + ASSERT(smfn_type == p2m_ram_shared); + + /* Check that the handles match */ + if ( spage->shared_info->handle != sh ) + goto err_unlock; + + /* Make sure the target page is a hole in the physmap */ + if ( mfn_valid(cmfn) || + (!(p2m_is_ram(cmfn_type))) ) + { + ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID; + goto err_unlock; + } + + /* This is simpler than regular sharing */ + BUG_ON(!get_page_and_type(spage, dom_cow, PGT_shared_page)); + if ( (gfn_info = mem_sharing_gfn_alloc(spage, cd, cgfn)) == NULL ) + { + put_page_and_type(spage); + ret = -ENOMEM; + goto err_unlock; + } + + p2m_lock(p2m); + ret = set_p2m_entry(p2m, cgfn, smfn, PAGE_ORDER_4K, p2m_ram_shared, a); + p2m_unlock(p2m); + + /* Tempted to turn this into an assert */ + if ( !ret ) + { + ret = -ENOENT; + mem_sharing_gfn_destroy(cd, gfn_info); + put_page_and_type(spage); + } else + ret = 0; + + atomic_inc(&nr_saved_mfns); + +err_unlock: + mem_sharing_page_unlock(spage, &pld); +err_out: + put_gfn(cd, cgfn); + put_gfn(sd, sgfn); + return ret; +} + int mem_sharing_unshare_page(struct domain *d, unsigned long gfn, uint16_t flags) @@ -1053,6 +1121,42 @@ int mem_sharing_domctl(struct domain *d, } break; + case XEN_DOMCTL_MEM_EVENT_OP_SHARING_ADD_PHYSMAP: + { + unsigned long sgfn, cgfn; + struct domain *cd; + shr_handle_t sh; + + if ( !mem_sharing_enabled(d) ) + return -EINVAL; + + cd = get_domain_by_id(mec->u.share.client_domain); + if ( !cd ) + return -ESRCH; + + if ( !mem_sharing_enabled(cd) ) + { + put_domain(cd); + return -EINVAL; + } + + if ( XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.source_gfn) ) + { + /* Cannot add a gref to the physmap */ + put_domain(cd); + return -EINVAL; + } + + sgfn = mec->u.share.source_gfn; + sh = mec->u.share.source_handle; + cgfn = mec->u.share.client_gfn; + + rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn); + + put_domain(cd); + } + break; + case XEN_DOMCTL_MEM_EVENT_OP_SHARING_RESUME: { if ( !mem_sharing_enabled(d) ) diff -r b0c70c156920 -r da6576b82e16 xen/include/public/domctl.h --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -771,6 +771,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_mem_e #define XEN_DOMCTL_MEM_EVENT_OP_SHARING_DEBUG_GFN 5 #define XEN_DOMCTL_MEM_EVENT_OP_SHARING_DEBUG_MFN 6 #define XEN_DOMCTL_MEM_EVENT_OP_SHARING_DEBUG_GREF 7 +#define XEN_DOMCTL_MEM_EVENT_OP_SHARING_ADD_PHYSMAP 8 #define XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID (-10) #define XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID (-9) @@ -797,7 +798,7 @@ struct xen_domctl_mem_sharing_op { } u; uint64_aligned_t handle; /* OUT: the handle */ } nominate; - struct mem_sharing_op_share { /* OP_SHARE */ + struct mem_sharing_op_share { /* OP_SHARE/ADD_PHYSMAP */ uint64_aligned_t source_gfn; /* IN: the gfn of the source page */ uint64_aligned_t source_handle; /* IN: handle to the source page */ domid_t client_domain; /* IN: the client domain id */
Andres Lagar-Cavilla
2012-Jan-16 02:56 UTC
[PATCH 07 of 12] Add the ability to poll stats about shared memory via the console
xen/arch/ia64/xen/mm.c | 5 +++++ xen/arch/x86/mm.c | 14 ++++++++++++++ xen/common/keyhandler.c | 7 +++++-- xen/include/xen/mm.h | 3 +++ 4 files changed, 27 insertions(+), 2 deletions(-) Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scannell <adin@scannell.ca> diff -r da6576b82e16 -r 98cc0713476f xen/arch/ia64/xen/mm.c --- a/xen/arch/ia64/xen/mm.c +++ b/xen/arch/ia64/xen/mm.c @@ -3573,6 +3573,11 @@ p2m_pod_decrease_reservation(struct doma return 0; } +/* Simple no-op */ +void arch_dump_shared_mem_info(void) +{ +} + /* * Local variables: * mode: C diff -r da6576b82e16 -r 98cc0713476f xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -113,6 +113,7 @@ #include <asm/e820.h> #include <asm/hypercall.h> #include <asm/shared.h> +#include <asm/mem_sharing.h> #include <public/memory.h> #include <public/sched.h> #include <xsm/xsm.h> @@ -5832,6 +5833,19 @@ void memguard_unguard_stack(void *p) memguard_unguard_range(p, PAGE_SIZE); } +#if defined(__x86_64__) +void arch_dump_shared_mem_info(void) +{ + printk("Shared frames %u -- Saved frames %u\n", + mem_sharing_get_nr_shared_mfns(), + mem_sharing_get_nr_saved_mfns()); +} +#else +void arch_dump_shared_mem_info(void) +{ +} +#endif + /* * Local variables: * mode: C diff -r da6576b82e16 -r 98cc0713476f xen/common/keyhandler.c --- a/xen/common/keyhandler.c +++ b/xen/common/keyhandler.c @@ -15,6 +15,7 @@ #include <xen/compat.h> #include <xen/ctype.h> #include <xen/perfc.h> +#include <xen/mm.h> #include <asm/debugger.h> #include <asm/div64.h> @@ -248,8 +249,8 @@ static void dump_domains(unsigned char k printk(" refcnt=%d dying=%d pause_count=%d\n", atomic_read(&d->refcnt), d->is_dying, atomic_read(&d->pause_count)); - printk(" nr_pages=%d xenheap_pages=%d dirty_cpus=%s max_pages=%u\n", - d->tot_pages, d->xenheap_pages, tmpstr, d->max_pages); + printk(" nr_pages=%d xenheap_pages=%d shared_pages=%u dirty_cpus=%s max_pages=%u\n", + d->tot_pages, d->xenheap_pages, atomic_read(&d->shr_pages), tmpstr, d->max_pages); printk(" handle=%02x%02x%02x%02x-%02x%02x-%02x%02x-" "%02x%02x-%02x%02x%02x%02x%02x%02x vm_assist=%08lx\n", d->handle[ 0], d->handle[ 1], d->handle[ 2], d->handle[ 3], @@ -308,6 +309,8 @@ static void dump_domains(unsigned char k } } + arch_dump_shared_mem_info(); + rcu_read_unlock(&domlist_read_lock); #undef tmpstr } diff -r da6576b82e16 -r 98cc0713476f xen/include/xen/mm.h --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -73,6 +73,9 @@ int assign_pages( unsigned int order, unsigned int memflags); +/* Dump info to serial console */ +void arch_dump_shared_mem_info(void); + /* memflags: */ #define _MEMF_no_refcount 0 #define MEMF_no_refcount (1U<<_MEMF_no_refcount)
Andres Lagar-Cavilla
2012-Jan-16 02:56 UTC
[PATCH 08 of 12] x86/mm: use RCU in mem sharing audit list, eliminate global lock completely
xen/arch/x86/mm/mem_sharing.c | 94 +++++++++++++++++--------------------- xen/arch/x86/mm/mm-locks.h | 18 ------- xen/include/asm-x86/mem_sharing.h | 1 + 3 files changed, 43 insertions(+), 70 deletions(-) Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 98cc0713476f -r 288ac8cfea7f xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -32,6 +32,7 @@ #include <asm/p2m.h> #include <asm/mem_event.h> #include <asm/atomic.h> +#include <xen/rcupdate.h> #include "mm-locks.h" @@ -47,52 +48,52 @@ typedef struct pg_lock_data { #if MEM_SHARING_AUDIT -static mm_lock_t shr_lock; - -#define shr_lock() _shr_lock() -#define shr_unlock() _shr_unlock() -#define shr_locked_by_me() _shr_locked_by_me() - static int mem_sharing_audit(void); #define MEM_SHARING_DEBUG(_f, _a...) \ debugtrace_printk("mem_sharing_debug: %s(): " _f, __func__, ##_a) static struct list_head shr_audit_list; +static spinlock_t shr_audit_lock; +DEFINE_RCU_READ_LOCK(shr_audit_read_lock); + +/* RCU delayed free of audit list entry */ +static void _free_pg_shared_info(struct rcu_head *head) +{ + xfree(container_of(head, struct page_sharing_info, rcu_head)); +} static inline void audit_add_list(struct page_info *page) { INIT_LIST_HEAD(&page->shared_info->entry); - list_add(&page->shared_info->entry, &shr_audit_list); + spin_lock(&shr_audit_lock); + list_add_rcu(&page->shared_info->entry, &shr_audit_list); + spin_unlock(&shr_audit_lock); } static inline void audit_del_list(struct page_info *page) { - list_del(&page->shared_info->entry); + spin_lock(&shr_audit_lock); + list_del_rcu(&page->shared_info->entry); + spin_unlock(&shr_audit_lock); + INIT_RCU_HEAD(&page->shared_info->rcu_head); + call_rcu(&page->shared_info->rcu_head, _free_pg_shared_info); } -static inline int mem_sharing_page_lock(struct page_info *p, - pg_lock_data_t *l) -{ - return 1; -} -#define mem_sharing_page_unlock(p, l) ((void)0) - -#define get_next_handle() next_handle++; #else -#define shr_lock() ((void)0) -#define shr_unlock() ((void)0) -/* Only used inside audit code */ -//#define shr_locked_by_me() ((void)0) - static inline int mem_sharing_audit(void) { return 0; } #define audit_add_list(p) ((void)0) -#define audit_del_list(p) ((void)0) +static inline void audit_del_list(struct page_info *page) +{ + xfree(page->shared_info); +} + +#endif /* MEM_SHARING_AUDIT */ static inline int mem_sharing_page_lock(struct page_info *pg, pg_lock_data_t *pld) @@ -128,7 +129,6 @@ static inline shr_handle_t get_next_hand while ( (y = cmpxchg(&next_handle, x, x + 1)) != x ); return x + 1; } -#endif /* MEM_SHARING_AUDIT */ #define mem_sharing_enabled(d) \ (is_hvm_domain(d) && (d)->arch.hvm_domain.mem_sharing_enabled) @@ -215,11 +215,13 @@ static int mem_sharing_audit(void) unsigned long count_expected; unsigned long count_found = 0; struct list_head *ae; + DECLARE_PG_LOCK_DATA(pld); - ASSERT(shr_locked_by_me()); count_expected = atomic_read(&nr_shared_mfns); - list_for_each(ae, &shr_audit_list) + rcu_read_lock(&shr_audit_read_lock); + + list_for_each_rcu(ae, &shr_audit_list) { struct page_sharing_info *shared_info; unsigned long nr_gfns = 0; @@ -231,6 +233,15 @@ static int mem_sharing_audit(void) pg = shared_info->pg; mfn = page_to_mfn(pg); + /* If we can''t lock it, it''s definitely not a shared page */ + if ( !mem_sharing_page_lock(pg, &pld) ) + { + MEM_SHARING_DEBUG("mfn %lx in audit list, but cannot be locked (%lx)!\n", + mfn_x(mfn), pg->u.inuse.type_info); + errors++; + continue; + } + /* Check if the MFN has correct type, owner and handle. */ if ( !(pg->u.inuse.type_info & PGT_shared_page) ) { @@ -311,8 +322,12 @@ static int mem_sharing_audit(void) (pg->u.inuse.type_info & PGT_count_mask)); errors++; } + + mem_sharing_page_unlock(pg, &pld); } + rcu_read_unlock(&shr_audit_read_lock); + if ( count_found != count_expected ) { MEM_SHARING_DEBUG("Expected %ld shared mfns, found %ld.", @@ -538,14 +553,10 @@ static int page_make_private(struct doma spin_lock(&d->page_alloc_lock); /* We can only change the type if count is one */ - /* If we are locking pages individually, then we need to drop + /* Because we are locking pages individually, we need to drop * the lock here, while the page is typed. We cannot risk the * race of page_unlock and then put_page_type. */ -#if MEM_SHARING_AUDIT - expected_type = (PGT_shared_page | PGT_validated | 1); -#else expected_type = (PGT_shared_page | PGT_validated | PGT_locked | 2); -#endif if ( page->u.inuse.type_info != expected_type ) { put_page(page); @@ -556,12 +567,8 @@ static int page_make_private(struct doma /* Drop the final typecount */ put_page_and_type(page); -#if MEM_SHARING_AUDIT - (void)pld; -#else /* Now that we''ve dropped the type, we can unlock */ mem_sharing_page_unlock(page, pld); -#endif /* Change the owner */ ASSERT(page_get_owner(page) == dom_cow); @@ -613,7 +620,6 @@ int mem_sharing_nominate_page(struct dom *phandle = 0UL; - shr_lock(); mfn = get_gfn(d, gfn, &p2mt); /* Check if mfn is valid */ @@ -705,7 +711,6 @@ int mem_sharing_nominate_page(struct dom out: put_gfn(d, gfn); - shr_unlock(); return ret; } @@ -721,8 +726,6 @@ int mem_sharing_share_pages(struct domai p2m_type_t smfn_type, cmfn_type; DECLARE_PG_LOCK_DATA(pld); - shr_lock(); - /* XXX if sd == cd handle potential deadlock by ordering * the get_ and put_gfn''s */ smfn = get_gfn(sd, sgfn, &smfn_type); @@ -808,7 +811,6 @@ int mem_sharing_share_pages(struct domai /* Clear the rest of the shared state */ audit_del_list(cpage); - xfree(cpage->shared_info); cpage->shared_info = NULL; mem_sharing_page_unlock(secondpg, &pld); @@ -826,7 +828,6 @@ int mem_sharing_share_pages(struct domai err_out: put_gfn(cd, cgfn); put_gfn(sd, sgfn); - shr_unlock(); return ret; } @@ -911,17 +912,12 @@ int mem_sharing_unshare_page(struct doma struct list_head *le; DECLARE_PG_LOCK_DATA(pld); - /* This is one of the reasons why we can''t enforce ordering - * between shr_lock and p2m fine-grained locks in mm-lock. - * Callers may walk in here already holding the lock for this gfn */ - shr_lock(); mem_sharing_audit(); mfn = get_gfn(d, gfn, &p2mt); /* Has someone already unshared it? */ if ( !p2m_is_shared(p2mt) ) { put_gfn(d, gfn); - shr_unlock(); return 0; } @@ -952,7 +948,6 @@ gfn_found: { /* Clean up shared state */ audit_del_list(page); - xfree(page->shared_info); page->shared_info = NULL; atomic_dec(&nr_shared_mfns); } @@ -968,7 +963,6 @@ gfn_found: test_and_clear_bit(_PGC_allocated, &page->count_info) ) put_page(page); put_gfn(d, gfn); - shr_unlock(); return 0; } @@ -987,7 +981,6 @@ gfn_found: mem_sharing_page_unlock(old_page, &pld); put_gfn(d, gfn); mem_sharing_notify_helper(d, gfn); - shr_unlock(); return -ENOMEM; } @@ -1018,7 +1011,6 @@ private_page_found: paging_mark_dirty(d, mfn_x(page_to_mfn(page))); /* We do not need to unlock a private page */ put_gfn(d, gfn); - shr_unlock(); return 0; } @@ -1191,9 +1183,7 @@ int mem_sharing_domctl(struct domain *d, break; } - shr_lock(); mem_sharing_audit(); - shr_unlock(); return rc; } @@ -1202,7 +1192,7 @@ void __init mem_sharing_init(void) { printk("Initing memory sharing.\n"); #if MEM_SHARING_AUDIT - mm_lock_init(&shr_lock); + spin_lock_init(&shr_audit_lock); INIT_LIST_HEAD(&shr_audit_list); #endif } diff -r 98cc0713476f -r 288ac8cfea7f xen/arch/x86/mm/mm-locks.h --- a/xen/arch/x86/mm/mm-locks.h +++ b/xen/arch/x86/mm/mm-locks.h @@ -143,19 +143,6 @@ static inline void mm_enforce_order_unlo * * ************************************************************************/ -#if MEM_SHARING_AUDIT -/* Page-sharing lock (global) - * - * A single global lock that protects the memory-sharing code''s - * hash tables. */ - -declare_mm_lock(shr) -#define _shr_lock() mm_lock(shr, &shr_lock) -#define _shr_unlock() mm_unlock(&shr_lock) -#define _shr_locked_by_me() mm_locked_by_me(&shr_lock) - -#else - /* Sharing per page lock * * This is an external lock, not represented by an mm_lock_t. The memory @@ -163,9 +150,6 @@ declare_mm_lock(shr) * tuples to a shared page. We enforce order here against the p2m lock, * which is taken after the page_lock to change the gfn''s p2m entry. * - * Note that in sharing audit mode, we use the global page lock above, - * instead. - * * The lock is recursive because during share we lock two pages. */ declare_mm_order_constraint(per_page_sharing) @@ -174,8 +158,6 @@ declare_mm_order_constraint(per_page_sha mm_enforce_order_lock_post_per_page_sharing((l), (r)) #define page_sharing_mm_unlock(l, r) mm_enforce_order_unlock((l), (r)) -#endif /* MEM_SHARING_AUDIT */ - /* Nested P2M lock (per-domain) * * A per-domain lock that protects the mapping from nested-CR3 to diff -r 98cc0713476f -r 288ac8cfea7f xen/include/asm-x86/mem_sharing.h --- a/xen/include/asm-x86/mem_sharing.h +++ b/xen/include/asm-x86/mem_sharing.h @@ -35,6 +35,7 @@ struct page_sharing_info shr_handle_t handle; /* Globally unique version / handle. */ #if MEM_SHARING_AUDIT struct list_head entry; /* List of all shared pages (entry). */ + struct rcu_head rcu_head; /* List of all shared pages (entry). */ #endif struct list_head gfns; /* List of domains and gfns for this page (head). */ };
tools/blktap2/drivers/tapdisk-vbd.c | 6 +- tools/blktap2/drivers/tapdisk.h | 2 +- tools/libxc/xc_memshr.c | 63 ++++++++++++++++++++++++++++++++++-- tools/libxc/xenctrl.h | 19 ++++++++++- tools/memshr/bidir-daemon.c | 34 ++++++++++++++----- tools/memshr/bidir-hash.h | 12 ++++-- tools/memshr/interface.c | 30 ++++++++++------- tools/memshr/memshr.h | 11 +++++- 8 files changed, 139 insertions(+), 38 deletions(-) This patch is the folded version of API updates, along with the associated tool changes to ensure that the build is always consistent. API updates: - The source domain in the sharing calls is no longer assumed to be dom0. - Previously, the mem sharing code would return an opaque handle to index shared pages (and nominees) in its global hash table. By removing the hash table, the handle becomes a version, to avoid sharing a stale version of a page. Thus, libxc wrappers and tools need to be updated to recall the share functions with the information needed to fetch the page (which they readily have). Tool updates: The only (in-tree, that we know of) consumer of the mem sharing API is the memshr tool. This is updated to use the new API. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scannell <adin@scannell.ca> diff -r 288ac8cfea7f -r 2d6a14ce5b0e tools/blktap2/drivers/tapdisk-vbd.c --- a/tools/blktap2/drivers/tapdisk-vbd.c +++ b/tools/blktap2/drivers/tapdisk-vbd.c @@ -1218,14 +1218,14 @@ __tapdisk_vbd_complete_td_request(td_vbd #ifdef MEMSHR if (treq.op == TD_OP_READ && td_flag_test(image->flags, TD_OPEN_RDONLY)) { - uint64_t hnd = treq.memshr_hnd; + share_tuple_t hnd = treq.memshr_hnd; uint16_t uid = image->memshr_id; blkif_request_t *breq = &vreq->req; uint64_t sec = tapdisk_vbd_breq_get_sector(breq, treq); int secs = breq->seg[treq.sidx].last_sect - breq->seg[treq.sidx].first_sect + 1; - if (hnd != 0) + if (hnd.handle != 0) memshr_vbd_complete_ro_request(hnd, uid, sec, secs); } @@ -1297,7 +1297,7 @@ __tapdisk_vbd_reissue_td_request(td_vbd_ /* Reset memshr handle. This''ll prevent * memshr_vbd_complete_ro_request being called */ - treq.memshr_hnd = 0; + treq.memshr_hnd.handle = 0; td_complete_request(treq, 0); } else td_queue_read(parent, treq); diff -r 288ac8cfea7f -r 2d6a14ce5b0e tools/blktap2/drivers/tapdisk.h --- a/tools/blktap2/drivers/tapdisk.h +++ b/tools/blktap2/drivers/tapdisk.h @@ -140,7 +140,7 @@ struct td_request { void *private; #ifdef MEMSHR - uint64_t memshr_hnd; + share_tuple_t memshr_hnd; #endif }; diff -r 288ac8cfea7f -r 2d6a14ce5b0e tools/libxc/xc_memshr.c --- a/tools/libxc/xc_memshr.c +++ b/tools/libxc/xc_memshr.c @@ -86,24 +86,79 @@ int xc_memshr_nominate_gref(xc_interface return ret; } -int xc_memshr_share(xc_interface *xch, - uint64_t source_handle, - uint64_t client_handle) +int xc_memshr_share_gfns(xc_interface *xch, + domid_t source_domain, + unsigned long source_gfn, + uint64_t source_handle, + domid_t client_domain, + unsigned long client_gfn, + uint64_t client_handle) { DECLARE_DOMCTL; struct xen_domctl_mem_sharing_op *op; domctl.cmd = XEN_DOMCTL_mem_sharing_op; domctl.interface_version = XEN_DOMCTL_INTERFACE_VERSION; - domctl.domain = 0; + domctl.domain = source_domain; op = &(domctl.u.mem_sharing_op); op->op = XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE; op->u.share.source_handle = source_handle; + op->u.share.source_gfn = source_gfn; + op->u.share.client_domain = client_domain; + op->u.share.client_gfn = client_gfn; op->u.share.client_handle = client_handle; return do_domctl(xch, &domctl); } +int xc_memshr_share_grefs(xc_interface *xch, + domid_t source_domain, + grant_ref_t source_gref, + uint64_t source_handle, + domid_t client_domain, + grant_ref_t client_gref, + uint64_t client_handle) +{ + DECLARE_DOMCTL; + struct xen_domctl_mem_sharing_op *op; + + domctl.cmd = XEN_DOMCTL_mem_sharing_op; + domctl.interface_version = XEN_DOMCTL_INTERFACE_VERSION; + domctl.domain = source_domain; + op = &(domctl.u.mem_sharing_op); + op->op = XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE; + op->u.share.source_handle = source_handle; + XEN_DOMCTL_MEM_SHARING_FIELD_MAKE_GREF(op->u.share.source_gfn, source_gref); + op->u.share.client_domain = client_domain; + XEN_DOMCTL_MEM_SHARING_FIELD_MAKE_GREF(op->u.share.client_gfn, client_gref); + op->u.share.client_handle = client_handle; + + return do_domctl(xch, &domctl); +} + +int xc_memshr_add_to_physmap(xc_interface *xch, + domid_t source_domain, + unsigned long source_gfn, + uint64_t source_handle, + domid_t client_domain, + unsigned long client_gfn) +{ + DECLARE_DOMCTL; + struct xen_domctl_mem_sharing_op *op; + + domctl.cmd = XEN_DOMCTL_mem_sharing_op; + domctl.interface_version = XEN_DOMCTL_INTERFACE_VERSION; + domctl.domain = source_domain; + op = &(domctl.u.mem_sharing_op); + op->op = XEN_DOMCTL_MEM_EVENT_OP_SHARING_ADD_PHYSMAP; + op->u.share.source_gfn = source_gfn; + op->u.share.source_handle = source_handle; + op->u.share.client_gfn = client_gfn; + op->u.share.client_domain = client_domain; + + return do_domctl(xch, &domctl); +} + int xc_memshr_domain_resume(xc_interface *xch, domid_t domid) { diff -r 288ac8cfea7f -r 2d6a14ce5b0e tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -1894,9 +1894,26 @@ int xc_memshr_nominate_gref(xc_interface domid_t domid, grant_ref_t gref, uint64_t *handle); -int xc_memshr_share(xc_interface *xch, +int xc_memshr_share_gfns(xc_interface *xch, + domid_t source_domain, + unsigned long source_gfn, uint64_t source_handle, + domid_t client_domain, + unsigned long client_gfn, uint64_t client_handle); +int xc_memshr_share_grefs(xc_interface *xch, + domid_t source_domain, + grant_ref_t source_gref, + uint64_t source_handle, + domid_t client_domain, + grant_ref_t client_gref, + uint64_t client_handle); +int xc_memshr_add_to_physmap(xc_interface *xch, + domid_t source_domain, + unsigned long source_gfn, + uint64_t source_handle, + domid_t client_domain, + unsigned long client_gfn); int xc_memshr_domain_resume(xc_interface *xch, domid_t domid); int xc_memshr_debug_gfn(xc_interface *xch, diff -r 288ac8cfea7f -r 2d6a14ce5b0e tools/memshr/bidir-daemon.c --- a/tools/memshr/bidir-daemon.c +++ b/tools/memshr/bidir-daemon.c @@ -19,16 +19,25 @@ #include <pthread.h> #include <inttypes.h> #include <unistd.h> +#include <errno.h> #include "bidir-hash.h" #include "memshr-priv.h" static struct blockshr_hash *blks_hash; +/* Callback in the iterator, remember this value, and leave */ +int find_one(vbdblk_t k, share_tuple_t v, void *priv) +{ + share_tuple_t *rv = (share_tuple_t *) priv; + *rv = v; + /* Break out of iterator loop */ + return 1; +} + void* bidir_daemon(void *unused) { uint32_t nr_ent, max_nr_ent, tab_size, max_load, min_load; - static uint64_t shrhnd = 1; while(1) { @@ -41,20 +50,30 @@ void* bidir_daemon(void *unused) /* Remove some hints as soon as we get to 90% capacity */ if(10 * nr_ent > 9 * max_nr_ent) { - uint64_t next_remove = shrhnd; + share_tuple_t next_remove; int to_remove; int ret; to_remove = 0.1 * max_nr_ent; while(to_remove > 0) { - ret = blockshr_shrhnd_remove(blks_hash, next_remove, NULL); - if(ret < 0) + /* We use the iterator to get one entry */ + next_remove.handle = 0; + ret = blockshr_hash_iterator(blks_hash, find_one, &next_remove); + + if ( !ret ) + if ( next_remove.handle == 0 ) + ret = -ESRCH; + + if ( !ret ) + ret = blockshr_shrhnd_remove(blks_hash, next_remove, NULL); + + if(ret <= 0) { /* We failed to remove an entry, because of a serious hash * table error */ DPRINTF("Could not remove handle %"PRId64", error: %d\n", - next_remove, ret); + next_remove.handle, ret); /* Force to exit the loop early */ to_remove = 0; } else @@ -62,12 +81,7 @@ void* bidir_daemon(void *unused) { /* Managed to remove the entry. Note next_remove not * incremented, in case there are duplicates */ - shrhnd = next_remove; to_remove--; - } else - { - /* Failed to remove, because there is no such handle */ - next_remove++; } } } diff -r 288ac8cfea7f -r 2d6a14ce5b0e tools/memshr/bidir-hash.h --- a/tools/memshr/bidir-hash.h +++ b/tools/memshr/bidir-hash.h @@ -20,6 +20,7 @@ #define __BIDIR_HASH_H__ #include <stdint.h> +#include <string.h> #include "memshr-priv.h" typedef struct vbdblk { @@ -81,15 +82,16 @@ static int fgprtshr_mfn_cmp(uint32_t m1, #undef BIDIR_VALUE #undef BIDIR_KEY_T #undef BIDIR_VALUE_T + /* TODO better hashes! */ static inline uint32_t blockshr_block_hash(vbdblk_t block) { return (uint32_t)(block.sec) ^ (uint32_t)(block.disk_id); } -static inline uint32_t blockshr_shrhnd_hash(uint64_t shrhnd) +static inline uint32_t blockshr_shrhnd_hash(share_tuple_t shrhnd) { - return (uint32_t)shrhnd; + return ((uint32_t) shrhnd.handle); } static inline int blockshr_block_cmp(vbdblk_t b1, vbdblk_t b2) @@ -97,15 +99,15 @@ static inline int blockshr_block_cmp(vbd return (b1.sec == b2.sec) && (b1.disk_id == b2.disk_id); } -static inline int blockshr_shrhnd_cmp(uint64_t h1, uint64_t h2) +static inline int blockshr_shrhnd_cmp(share_tuple_t h1, share_tuple_t h2) { - return (h1 == h2); + return ( !memcmp(&h1, &h2, sizeof(share_tuple_t)) ); } #define BIDIR_NAME_PREFIX blockshr #define BIDIR_KEY block #define BIDIR_VALUE shrhnd #define BIDIR_KEY_T vbdblk_t -#define BIDIR_VALUE_T uint64_t +#define BIDIR_VALUE_T share_tuple_t #include "bidir-namedefs.h" #endif /* BLOCK_MAP */ diff -r 288ac8cfea7f -r 2d6a14ce5b0e tools/memshr/interface.c --- a/tools/memshr/interface.c +++ b/tools/memshr/interface.c @@ -145,16 +145,17 @@ void memshr_vbd_image_put(uint16_t memsh int memshr_vbd_issue_ro_request(char *buf, grant_ref_t gref, - uint16_t file_id, + uint16_t file_id, uint64_t sec, int secs, - uint64_t *hnd) + share_tuple_t *hnd) { vbdblk_t blk; - uint64_t s_hnd, c_hnd; + share_tuple_t source_st, client_st; + uint64_t c_hnd; int ret; - *hnd = 0; + *hnd = (share_tuple_t){ 0, 0, 0 }; if(!vbd_info.enabled) return -1; @@ -169,26 +170,31 @@ int memshr_vbd_issue_ro_request(char *bu /* If page couldn''t be made sharable, we cannot do anything about it */ if(ret != 0) return -3; - *hnd = c_hnd; + + client_st = (share_tuple_t){ vbd_info.domid, gref, c_hnd }; + *hnd = client_st; /* Check if we''ve read matching disk block previously */ blk.sec = sec; blk.disk_id = file_id; - if(blockshr_block_lookup(memshr.blks, blk, &s_hnd) > 0) + if(blockshr_block_lookup(memshr.blks, blk, &source_st) > 0) { - ret = xc_memshr_share(vbd_info.xc_handle, s_hnd, c_hnd); + ret = xc_memshr_share_grefs(vbd_info.xc_handle, source_st.domain, source_st.frame, + source_st.handle, vbd_info.domid, gref, c_hnd); if(!ret) return 0; /* Handles failed to be shared => at least one of them must be invalid, remove the relevant ones from the map */ switch(ret) { case XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID: - ret = blockshr_shrhnd_remove(memshr.blks, s_hnd, NULL); - if(ret) DPRINTF("Could not rm invl s_hnd: %"PRId64"\n", s_hnd); + ret = blockshr_shrhnd_remove(memshr.blks, source_st, NULL); + if(ret) DPRINTF("Could not rm invl s_hnd: %u %"PRId64" %"PRId64"\n", + source_st.domain, source_st.frame, source_st.handle); break; case XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID: - ret = blockshr_shrhnd_remove(memshr.blks, c_hnd, NULL); - if(ret) DPRINTF("Could not rm invl c_hnd: %"PRId64"\n", c_hnd); + ret = blockshr_shrhnd_remove(memshr.blks, client_st, NULL); + if(ret) DPRINTF("Could not rm invl c_hnd: %u %"PRId64" %"PRId64"\n", + client_st.domain, client_st.frame, client_st.handle); break; default: break; @@ -199,7 +205,7 @@ int memshr_vbd_issue_ro_request(char *bu return -4; } -void memshr_vbd_complete_ro_request(uint64_t hnd, +void memshr_vbd_complete_ro_request(share_tuple_t hnd, uint16_t file_id, uint64_t sec, int secs) diff -r 288ac8cfea7f -r 2d6a14ce5b0e tools/memshr/memshr.h --- a/tools/memshr/memshr.h +++ b/tools/memshr/memshr.h @@ -25,6 +25,13 @@ typedef uint64_t xen_mfn_t; +typedef struct share_tuple +{ + uint32_t domain; + uint64_t frame; + uint64_t handle; +} share_tuple_t; + extern void memshr_set_domid(int domid); extern void memshr_daemon_initialize(void); extern void memshr_vbd_initialize(void); @@ -35,9 +42,9 @@ extern int memshr_vbd_issue_ro_request(c uint16_t file_id, uint64_t sec, int secs, - uint64_t *hnd); + share_tuple_t *hnd); extern void memshr_vbd_complete_ro_request( - uint64_t hnd, + share_tuple_t hnd, uint16_t file_id, uint64_t sec, int secs);
Andres Lagar-Cavilla
2012-Jan-16 02:56 UTC
[PATCH 10 of 12] Tools: Expose to libxc the total number of shared frames and space saved
tools/libxc/xc_memshr.c | 10 ++++++++++ tools/libxc/xenctrl.h | 17 +++++++++++++++++ 2 files changed, 27 insertions(+), 0 deletions(-) Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 2d6a14ce5b0e -r b596035ff0e2 tools/libxc/xc_memshr.c --- a/tools/libxc/xc_memshr.c +++ b/tools/libxc/xc_memshr.c @@ -225,3 +225,13 @@ int xc_memshr_debug_gref(xc_interface *x return do_domctl(xch, &domctl); } +long xc_sharing_freed_pages(xc_interface *xch) +{ + return do_memory_op(xch, XENMEM_get_sharing_freed_pages, NULL, 0); +} + +long xc_sharing_used_frames(xc_interface *xch) +{ + return do_memory_op(xch, XENMEM_get_sharing_shared_pages, NULL, 0); +} + diff -r 2d6a14ce5b0e -r b596035ff0e2 tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -1226,6 +1226,23 @@ int xc_mmuext_op(xc_interface *xch, stru /* System wide memory properties */ long xc_maximum_ram_page(xc_interface *xch); +/** + * This function returns the total number of pages freed by using sharing + * on the system. For example, if two domains contain a single entry in + * their p2m map that points to the same shared page (and no other pages + * in the system are shared), then this function should return 1. + */ +long xc_sharing_freed_pages(xc_interface *xch); + +/** + * This function returns the total number of frames occupied by shared + * pages on the system. This is independent of the number of domains + * pointing at these frames. For example, in the above scenario this + * should return 1. The following should hold: + * memory usage without sharing = freed_pages + used_frames + */ +long xc_sharing_used_frames(xc_interface *xch); + /* Get current total pages allocated to a domain. */ long xc_get_tot_pages(xc_interface *xch, uint32_t domid);
Andres Lagar-Cavilla
2012-Jan-16 02:56 UTC
[PATCH 11 of 12] Tools: Add a sharing command to xl for information about shared pages
docs/man/xl.pod.1 | 13 ++++++++ tools/libxl/libxl.c | 2 + tools/libxl/libxl_types.idl | 2 + tools/libxl/xl.h | 1 + tools/libxl/xl_cmdimpl.c | 66 +++++++++++++++++++++++++++++++++++++++++++++ tools/libxl/xl_cmdtable.c | 5 +++ 6 files changed, 89 insertions(+), 0 deletions(-) Also add the global sharing statistics to the libxl physinfo. This is a slight departure from libxc, but there''s no reason libxl physinfo can''t include extra bits of useful and relevant information. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scannell <adin@scannell.ca> diff -r b596035ff0e2 -r 6522ae36bc61 docs/man/xl.pod.1 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -410,6 +410,19 @@ Leave domain running after creating the =back +=item B<sharing> [I<domain-id>] + +List count of shared pages. + +B<OPTIONS> + +=over 4 + +=item I<domain_id> + +List specifically for that domain. Otherwise, list for all domains. + +=back =item B<shutdown> [I<OPTIONS>] I<domain-id> diff -r b596035ff0e2 -r 6522ae36bc61 tools/libxl/libxl.c --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -2518,6 +2518,8 @@ int libxl_get_physinfo(libxl_ctx *ctx, l physinfo->total_pages = xcphysinfo.total_pages; physinfo->free_pages = xcphysinfo.free_pages; physinfo->scrub_pages = xcphysinfo.scrub_pages; + physinfo->sharing_freed_pages = xc_sharing_freed_pages(ctx->xch); + physinfo->sharing_used_frames = xc_sharing_used_frames(ctx->xch); physinfo->nr_nodes = xcphysinfo.nr_nodes; memcpy(physinfo->hw_cap,xcphysinfo.hw_cap, sizeof(physinfo->hw_cap)); physinfo->phys_cap = xcphysinfo.capabilities; diff -r b596035ff0e2 -r 6522ae36bc61 tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -366,6 +366,8 @@ libxl_physinfo = Struct("physinfo", [ ("total_pages", uint64), ("free_pages", uint64), ("scrub_pages", uint64), + ("sharing_freed_pages", uint64), + ("sharing_used_frames", uint64), ("nr_nodes", uint32), ("hw_cap", libxl_hwcap), diff -r b596035ff0e2 -r 6522ae36bc61 tools/libxl/xl.h --- a/tools/libxl/xl.h +++ b/tools/libxl/xl.h @@ -28,6 +28,7 @@ struct cmd_spec { int main_vcpulist(int argc, char **argv); int main_info(int argc, char **argv); +int main_sharing(int argc, char **argv); int main_cd_eject(int argc, char **argv); int main_cd_insert(int argc, char **argv); int main_console(int argc, char **argv); diff -r b596035ff0e2 -r 6522ae36bc61 tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -3693,6 +3693,8 @@ static void output_physinfo(void) i = (1 << 20) / vinfo->pagesize; printf("total_memory : %"PRIu64"\n", info.total_pages / i); printf("free_memory : %"PRIu64"\n", info.free_pages / i); + printf("sharing_freed_memory : %"PRIu64"\n", info.sharing_freed_pages / i); + printf("sharing_used_memory : %"PRIu64"\n", info.sharing_used_frames / i); } if (!libxl_get_freecpus(ctx, &cpumap)) { libxl_for_each_cpu(i, cpumap) @@ -3776,6 +3778,70 @@ int main_info(int argc, char **argv) return 0; } +static void sharing(const libxl_dominfo *info, int nb_domain) +{ + int i; + + printf("Name ID Mem Shared\n"); + + for (i = 0; i < nb_domain; i++) { + char *domname; + unsigned shutdown_reason; + domname = libxl_domid_to_name(ctx, info[i].domid); + shutdown_reason = info[i].shutdown ? info[i].shutdown_reason : 0; + printf("%-40s %5d %5lu %5lu\n", + domname, + info[i].domid, + (unsigned long) (info[i].current_memkb / 1024), + (unsigned long) (info[i].shared_memkb / 1024)); + free(domname); + } +} + +int main_sharing(int argc, char **argv) +{ + int opt = 0; + libxl_dominfo info_buf; + libxl_dominfo *info, *info_free = NULL; + int nb_domain, rc; + + if ((opt = def_getopt(argc, argv, "", "sharing", 0)) != -1) + return opt; + + if (optind >= argc) { + info = libxl_list_domain(ctx, &nb_domain); + if (!info) { + fprintf(stderr, "libxl_domain_infolist failed.\n"); + return 1; + } + info_free = info; + } else if (optind == argc-1) { + find_domain(argv[optind]); + rc = libxl_domain_info(ctx, &info_buf, domid); + if (rc == ERROR_INVAL) { + fprintf(stderr, "Error: Domain \''%s\'' does not exist.\n", + argv[optind]); + return -rc; + } + if (rc) { + fprintf(stderr, "libxl_domain_info failed (code %d).\n", rc); + return -rc; + } + info = &info_buf; + nb_domain = 1; + } else { + help("sharing"); + return 2; + } + + sharing(info, nb_domain); + + if (info_free) + free(info_free); + + return 0; +} + static int sched_credit_domain_get( int domid, libxl_sched_credit *scinfo) { diff -r b596035ff0e2 -r 6522ae36bc61 tools/libxl/xl_cmdtable.c --- a/tools/libxl/xl_cmdtable.c +++ b/tools/libxl/xl_cmdtable.c @@ -189,6 +189,11 @@ struct cmd_spec cmd_table[] = { "Get information about Xen host", "-n, --numa List host NUMA topology information", }, + { "sharing", + &main_sharing, 0, + "Get information about page sharing", + "[Domain]", + }, { "sched-credit", &main_sched_credit, 0, "Get/set credit scheduler parameters",
Andres Lagar-Cavilla
2012-Jan-16 02:56 UTC
[PATCH 12 of 12] Memshrtool: tool to test and exercise the sharing subsystem
tools/tests/mem-sharing/Makefile | 26 +++++ tools/tests/mem-sharing/memshrtool.c | 165 +++++++++++++++++++++++++++++++++++ 2 files changed, 191 insertions(+), 0 deletions(-) This is demo code meant to showcase how to perform sharing operations. It is useful for testing. We submit it so others in the list can have unlimited sharing fun, but not necessarily expecting it be added to the tree. Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 6522ae36bc61 -r 82e900cc7255 tools/tests/mem-sharing/Makefile --- /dev/null +++ b/tools/tests/mem-sharing/Makefile @@ -0,0 +1,26 @@ +XEN_ROOT=$(CURDIR)/../../.. +include $(XEN_ROOT)/tools/Rules.mk + +CFLAGS += -Werror + +CFLAGS += $(CFLAGS_libxenctrl) +CFLAGS += $(CFLAGS_xeninclude) + +TARGETS-y := +TARGETS-$(CONFIG_X86) += memshrtool +TARGETS := $(TARGETS-y) + +.PHONY: all +all: build + +.PHONY: build +build: $(TARGETS) + +.PHONY: clean +clean: + $(RM) *.o $(TARGETS) *~ $(DEPS) + +memshrtool: memshrtool.o + $(CC) -o $@ $< $(LDFLAGS) $(LDLIBS_libxenctrl) + +-include $(DEPS) diff -r 6522ae36bc61 -r 82e900cc7255 tools/tests/mem-sharing/memshrtool.c --- /dev/null +++ b/tools/tests/mem-sharing/memshrtool.c @@ -0,0 +1,165 @@ +/* + * memshrtool.c + * + * Copyright 2011 GridCentric Inc. (Adin Scannell, Andres Lagar-Cavilla) + */ + +#include <stdio.h> +#include <unistd.h> +#include <stdlib.h> +#include <errno.h> +#include <string.h> +#include <sys/mman.h> + +#include "xenctrl.h" + +static int usage(const char* prog) +{ + printf("usage: %s <command> [args...]\n", prog); + printf("where <command> may be:\n"); + printf(" info - Display total sharing info.\n"); + printf(" enable - Enable sharing on a domain.\n"); + printf(" disable - Disable sharing on a domain.\n"); + printf(" nominate <domid> <gfn> - Nominate a page for sharing.\n"); + printf(" share <domid> <gfn> <handle> <source> <source-gfn> <source-handle>\n"); + printf(" - Share two pages.\n"); + printf(" unshare <domid> <gfn> - Unshare a page by grabbing a writable map.\n"); + printf(" add-to-physmap <domid> <gfn> <source> <source-gfn> <source-handle>\n"); + printf(" - Populate a page in a domain with a shared page.\n"); + printf(" debug-gfn <domid> <gfn> - Debug a particular domain and gfn.\n"); + return 1; +} + +#define R(f) do { \ + int rc = f; \ + if ( rc < 0 ) { \ + printf("error executing %s: %s\n", #f, \ + ((errno * -1) == XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID) ? \ + "problem with client handle" :\ + ((errno * -1) == XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID) ? \ + "problem with source handle" : strerror(errno)); \ + return rc; \ + } \ +} while(0) + +int main(int argc, const char** argv) +{ + const char* cmd = NULL; + xc_interface *xch = xc_interface_open(0, 0, 0); + + if( argc < 2 ) + return usage(argv[0]); + + cmd = argv[1]; + + if( !strcasecmp(cmd, "info") ) + { + if( argc != 2 ) + return usage(argv[0]); + + printf("used = %ld\n", xc_sharing_used_frames(xch)); + printf("freed = %ld\n", xc_sharing_freed_pages(xch)); + } + else if( !strcasecmp(cmd, "enable") ) + { + domid_t domid; + + if( argc != 3 ) + return usage(argv[0]); + + domid = strtol(argv[2], NULL, 0); + R(xc_memshr_control(xch, domid, 1)); + } + else if( !strcasecmp(cmd, "disable") ) + { + domid_t domid; + + if( argc != 3 ) + return usage(argv[0]); + + domid = strtol(argv[2], NULL, 0); + R(xc_memshr_control(xch, domid, 0)); + } + else if( !strcasecmp(cmd, "nominate") ) + { + domid_t domid; + unsigned long gfn; + uint64_t handle; + + if( argc != 4 ) + return usage(argv[0]); + + domid = strtol(argv[2], NULL, 0); + gfn = strtol(argv[3], NULL, 0); + R(xc_memshr_nominate_gfn(xch, domid, gfn, &handle)); + printf("handle = 0x%08llx\n", (unsigned long long) handle); + } + else if( !strcasecmp(cmd, "share") ) + { + domid_t domid; + unsigned long gfn; + uint64_t handle; + domid_t source_domid; + unsigned long source_gfn; + uint64_t source_handle; + + if( argc != 8 ) + return usage(argv[0]); + + domid = strtol(argv[2], NULL, 0); + gfn = strtol(argv[3], NULL, 0); + handle = strtol(argv[4], NULL, 0); + source_domid = strtol(argv[5], NULL, 0); + source_gfn = strtol(argv[6], NULL, 0); + source_handle = strtol(argv[7], NULL, 0); + R(xc_memshr_share_gfns(xch, source_domid, source_gfn, source_handle, domid, gfn, handle)); + } + else if( !strcasecmp(cmd, "unshare") ) + { + domid_t domid; + unsigned long gfn; + void *map; + + if( argc != 4 ) + return usage(argv[0]); + + domid = strtol(argv[2], NULL, 0); + gfn = strtol(argv[3], NULL, 0); + map = xc_map_foreign_range(xch, domid, 4096, PROT_WRITE, gfn); + if( map ) + munmap(map, 4096); + R((int)!map); + } + else if( !strcasecmp(cmd, "add-to-physmap") ) + { + domid_t domid; + unsigned long gfn; + domid_t source_domid; + unsigned long source_gfn; + uint64_t source_handle; + + if( argc != 7 ) + return usage(argv[0]); + + domid = strtol(argv[2], NULL, 0); + gfn = strtol(argv[3], NULL, 0); + source_domid = strtol(argv[4], NULL, 0); + source_gfn = strtol(argv[5], NULL, 0); + source_handle = strtol(argv[6], NULL, 0); + R(xc_memshr_add_to_physmap(xch, source_domid, source_gfn, source_handle, domid, gfn)); + } + else if( !strcasecmp(cmd, "debug-gfn") ) + { + domid_t domid; + unsigned long gfn; + + if( argc != 4 ) + return usage(argv[0]); + + domid = strtol(argv[2], NULL, 0); + gfn = strtol(argv[3], NULL, 0); + R(xc_memshr_debug_gfn(xch, domid, gfn)); + } + + return 0; +}
Tim Deegan
2012-Jan-19 11:39 UTC
Re: [PATCH 01 of 12] x86/mm: Eliminate hash table in sharing code as index of shared mfns
Hi, This seems basically fine; it''s a logical progression from the interface changes. A few comments inline below. One other thing I noticed, which is independent of this patch, really: I''m not sure that having a domain ID in the gfn_info is the right thing to do. It seems like we could just carry a domain pointer. At 21:56 -0500 on 15 Jan (1326664581), Andres Lagar-Cavilla wrote:> The hashtable mechanism used by the sharing code is a bit odd. It seems > unnecessary and adds complexity.I think that''s a bit harsh, since you removed its main use yourself! :)> Instead, we replace this by storing a list > head directly in the page info for the case when the page is shared. This does > not add any extra space to the page_info and serves to remove significant > complexity from sharing. > > Signed-off-by: Adin Scannell <adin@scannell.ca> > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > diff -r 302a58874eb1 -r 2a4112172e44 xen/arch/x86/mm/mem_sharing.c > --- a/xen/arch/x86/mm/mem_sharing.c > +++ b/xen/arch/x86/mm/mem_sharing.c > @@ -3,6 +3,7 @@ > * > * Memory sharing support. > * > + * Copyright (c) 2011 GridCentric, Inc. (Adin Scannell & Andres Lagar-Cavilla) > * Copyright (c) 2009 Citrix Systems, Inc. (Grzegorz Milos) > * > * This program is free software; you can redistribute it and/or modify > @@ -34,15 +35,30 @@ > > #include "mm-locks.h" > > -/* Auditing of memory sharing code? */ > -#define MEM_SHARING_AUDIT 0 > - > #if MEM_SHARING_AUDIT > -static void mem_sharing_audit(void); > +static int mem_sharing_audit(void);Hmmm. You haven''t made any of the callers use this new return value. :( Better to leave it as it was.> struct page_info > { > union { > @@ -49,8 +51,13 @@ struct page_info > /* For non-pinnable single-page shadows, a higher entry that points > * at us. */ > paddr_t up; > - /* For shared/sharable pages the sharing handle */ > - uint64_t shr_handle; > + /* For shared/sharable pages, we use a doubly-linked list > + * of all the {pfn,domain} pairs that map this page. We also include > + * an opaque handle, which is effectively a version, so that clients > + * of sharing share the version they expect to. > + * This list is allocated and freed when a page is shared/unshared. > + */ > + struct page_sharing_info *shared_info;Can you call this field ''sharing'' instead? ''shared-info'' makes me think of the per-domain shared_info page, and tripped me up a few times reading this patch. :) Cheers, Tim.
Tim Deegan
2012-Jan-19 11:56 UTC
Re: [PATCH 02 of 12] x86/mm: Update mem sharing interface to (re)allow sharing of grants
Hi, At 21:56 -0500 on 15 Jan (1326664582), Andres Lagar-Cavilla wrote:> Previosuly, the mem sharing code would return an opaque handle to > index shared pages (and nominees) in its global hash table. By > removing the hash table, the new interfaces requires a gfn and a > version. However, when sharing grants, the caller provides a grant ref > and a version. Update interface to handle this case.I''m not sure I understand the motivation. Is the idea that some HVM domUs have granted memory to, let''s call it domS, and domS then notices that the contents are the same and tells Xen to share the original frames? In any case, the interface change should have some docuentation to match it. :)> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Signed-off-by: Adin Scannell <adin@scannell.ca> > > diff -r 2a4112172e44 -r e0d8333e4725 xen/arch/x86/mm/mem_sharing.c > --- a/xen/arch/x86/mm/mem_sharing.c > +++ b/xen/arch/x86/mm/mem_sharing.c > @@ -733,18 +733,57 @@ int mem_sharing_domctl(struct domain *d, > > case XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE: > { > - unsigned long sgfn = mec->u.share.source_gfn; > - shr_handle_t sh = mec->u.share.source_handle; > - struct domain *cd = get_domain_by_id(mec->u.share.client_domain); > - if ( cd ) > + unsigned long sgfn, cgfn; > + struct domain *cd; > + shr_handle_t sh, ch; > + > + if ( !mem_sharing_enabled(d) ) > + return -EINVAL; > + > + cd = get_domain_by_id(mec->u.share.client_domain); > + if ( !cd ) > + return -ESRCH; > + > + if ( !mem_sharing_enabled(cd) ) > { > - unsigned long cgfn = mec->u.share.client_gfn; > - shr_handle_t ch = mec->u.share.client_handle; > - rc = mem_sharing_share_pages(d, sgfn, sh, cd, cgfn, ch); > put_domain(cd); > + return -EINVAL; > } > - else > - return -EEXIST; > + > + if ( XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.source_gfn) ) > + { > + grant_ref_t gref = (grant_ref_t) > + (XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF( > + mec->u.share.source_gfn)); > + if ( mem_sharing_gref_to_gfn(d, gref, &sgfn) < 0 ) > + { > + put_domain(cd); > + return -EINVAL; > + } > + } else { > + sgfn = mec->u.share.source_gfn; > + } > + > + if ( XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.client_gfn) ) > + { > + grant_ref_t gref = (grant_ref_t) > + (XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF( > + mec->u.share.client_gfn)); > + if ( mem_sharing_gref_to_gfn(cd, gref, &cgfn) < 0 ) > + { > + put_domain(cd); > + return -EINVAL; > + } > + } else { > + cgfn = mec->u.share.client_gfn; > + } > + > + sh = mec->u.share.source_handle; > + ch = mec->u.share.client_handle; > + > + rc = mem_sharing_share_pages(d, sgfn, sh, cd, cgfn, ch); > + > + put_domain(cd); > } > break; > > diff -r 2a4112172e44 -r e0d8333e4725 xen/include/public/domctl.h > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -775,6 +775,15 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_mem_e > #define XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID (-10) > #define XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID (-9) > > +#define XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG (1ULL << 62) > + > +#define XEN_DOMCTL_MEM_SHARING_FIELD_MAKE_GREF(field, val) \ > + (field) = (XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG | val) > +#define XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(field) \ > + ((field) & XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG) > +#define XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF(field) \ > + ((field) & (~XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG)) > + > struct xen_domctl_mem_sharing_op { > uint8_t op; /* XEN_DOMCTL_MEM_EVENT_OP_* */ > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
Tim Deegan
2012-Jan-19 12:13 UTC
Re: [PATCH 03 of 12] x86/mm: Add per-page locking for memory sharing, when audits are disabled
At 21:56 -0500 on 15 Jan (1326664583), Andres Lagar-Cavilla wrote:> xen/arch/x86/mm.c | 74 +----------- > xen/arch/x86/mm/mem_sharing.c | 257 ++++++++++++++++++++++++++++++++++++++--- > xen/arch/x86/mm/mm-locks.h | 15 +- > xen/include/asm-x86/mm.h | 27 +++- > 4 files changed, 275 insertions(+), 98 deletions(-) > > > With the removal of the hash table, all that is needed now is locking > of individual shared pages, as new (gfn,domain) pairs are removed or > added from the list of mappings. > > We recycle PGT_locked and use it to lock individual pages. We ensure deadlock > is averted by locking pages in increasing order. > > The global lock remains for the benefit of the auditing code, and is > thus enabled only as a compile-time option.I think that having the audit-enabled config disable all the fine-grained locking is asking for heisenbugs. (But maybe that goes away in your later RCU patch, which I''ve yet to read). Otherwise, looks good. Oh, and:> +#if MEM_SHARING_AUDIT > /* Page-sharing lock (global) > * > * A single global lock that protects the memory-sharing code''s > * hash tables. */This comment should probably have been updated in the patch that killed the hash table. :) Tim.
Ian Campbell
2012-Jan-19 12:14 UTC
Re: [PATCH 11 of 12] Tools: Add a sharing command to xl for information about shared pages
On Mon, 2012-01-16 at 02:56 +0000, Andres Lagar-Cavilla wrote:> docs/man/xl.pod.1 | 13 ++++++++ > tools/libxl/libxl.c | 2 + > tools/libxl/libxl_types.idl | 2 + > tools/libxl/xl.h | 1 + > tools/libxl/xl_cmdimpl.c | 66 +++++++++++++++++++++++++++++++++++++++++++++ > tools/libxl/xl_cmdtable.c | 5 +++ > 6 files changed, 89 insertions(+), 0 deletions(-) > > > Also add the global sharing statistics to the libxl physinfo. This is a slight > departure from libxc, but there''s no reason libxl physinfo can''t include extra > bits of useful and relevant information. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Signed-off-by: Adin Scannell <adin@scannell.ca>Acked-by: Ian Campbell <ian.campbell@citrix.com>> > diff -r b596035ff0e2 -r 6522ae36bc61 docs/man/xl.pod.1 > --- a/docs/man/xl.pod.1 > +++ b/docs/man/xl.pod.1 > @@ -410,6 +410,19 @@ Leave domain running after creating the > > =back > > +=item B<sharing> [I<domain-id>] > + > +List count of shared pages. > + > +B<OPTIONS> > + > +=over 4 > + > +=item I<domain_id> > + > +List specifically for that domain. Otherwise, list for all domains. > + > +=back > > =item B<shutdown> [I<OPTIONS>] I<domain-id> > > diff -r b596035ff0e2 -r 6522ae36bc61 tools/libxl/libxl.c > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -2518,6 +2518,8 @@ int libxl_get_physinfo(libxl_ctx *ctx, l > physinfo->total_pages = xcphysinfo.total_pages; > physinfo->free_pages = xcphysinfo.free_pages; > physinfo->scrub_pages = xcphysinfo.scrub_pages; > + physinfo->sharing_freed_pages = xc_sharing_freed_pages(ctx->xch); > + physinfo->sharing_used_frames = xc_sharing_used_frames(ctx->xch); > physinfo->nr_nodes = xcphysinfo.nr_nodes; > memcpy(physinfo->hw_cap,xcphysinfo.hw_cap, sizeof(physinfo->hw_cap)); > physinfo->phys_cap = xcphysinfo.capabilities; > diff -r b596035ff0e2 -r 6522ae36bc61 tools/libxl/libxl_types.idl > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -366,6 +366,8 @@ libxl_physinfo = Struct("physinfo", [ > ("total_pages", uint64), > ("free_pages", uint64), > ("scrub_pages", uint64), > + ("sharing_freed_pages", uint64), > + ("sharing_used_frames", uint64), > > ("nr_nodes", uint32), > ("hw_cap", libxl_hwcap), > diff -r b596035ff0e2 -r 6522ae36bc61 tools/libxl/xl.h > --- a/tools/libxl/xl.h > +++ b/tools/libxl/xl.h > @@ -28,6 +28,7 @@ struct cmd_spec { > > int main_vcpulist(int argc, char **argv); > int main_info(int argc, char **argv); > +int main_sharing(int argc, char **argv); > int main_cd_eject(int argc, char **argv); > int main_cd_insert(int argc, char **argv); > int main_console(int argc, char **argv); > diff -r b596035ff0e2 -r 6522ae36bc61 tools/libxl/xl_cmdimpl.c > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -3693,6 +3693,8 @@ static void output_physinfo(void) > i = (1 << 20) / vinfo->pagesize; > printf("total_memory : %"PRIu64"\n", info.total_pages / i); > printf("free_memory : %"PRIu64"\n", info.free_pages / i); > + printf("sharing_freed_memory : %"PRIu64"\n", info.sharing_freed_pages / i); > + printf("sharing_used_memory : %"PRIu64"\n", info.sharing_used_frames / i); > } > if (!libxl_get_freecpus(ctx, &cpumap)) { > libxl_for_each_cpu(i, cpumap) > @@ -3776,6 +3778,70 @@ int main_info(int argc, char **argv) > return 0; > } > > +static void sharing(const libxl_dominfo *info, int nb_domain) > +{ > + int i; > + > + printf("Name ID Mem Shared\n"); > + > + for (i = 0; i < nb_domain; i++) { > + char *domname; > + unsigned shutdown_reason; > + domname = libxl_domid_to_name(ctx, info[i].domid); > + shutdown_reason = info[i].shutdown ? info[i].shutdown_reason : 0; > + printf("%-40s %5d %5lu %5lu\n", > + domname, > + info[i].domid, > + (unsigned long) (info[i].current_memkb / 1024), > + (unsigned long) (info[i].shared_memkb / 1024)); > + free(domname); > + } > +} > + > +int main_sharing(int argc, char **argv) > +{ > + int opt = 0; > + libxl_dominfo info_buf; > + libxl_dominfo *info, *info_free = NULL; > + int nb_domain, rc; > + > + if ((opt = def_getopt(argc, argv, "", "sharing", 0)) != -1) > + return opt; > + > + if (optind >= argc) { > + info = libxl_list_domain(ctx, &nb_domain); > + if (!info) { > + fprintf(stderr, "libxl_domain_infolist failed.\n"); > + return 1; > + } > + info_free = info; > + } else if (optind == argc-1) { > + find_domain(argv[optind]); > + rc = libxl_domain_info(ctx, &info_buf, domid); > + if (rc == ERROR_INVAL) { > + fprintf(stderr, "Error: Domain \''%s\'' does not exist.\n", > + argv[optind]); > + return -rc; > + } > + if (rc) { > + fprintf(stderr, "libxl_domain_info failed (code %d).\n", rc); > + return -rc; > + } > + info = &info_buf; > + nb_domain = 1; > + } else { > + help("sharing"); > + return 2; > + } > + > + sharing(info, nb_domain); > + > + if (info_free) > + free(info_free); > + > + return 0; > +} > + > static int sched_credit_domain_get( > int domid, libxl_sched_credit *scinfo) > { > diff -r b596035ff0e2 -r 6522ae36bc61 tools/libxl/xl_cmdtable.c > --- a/tools/libxl/xl_cmdtable.c > +++ b/tools/libxl/xl_cmdtable.c > @@ -189,6 +189,11 @@ struct cmd_spec cmd_table[] = { > "Get information about Xen host", > "-n, --numa List host NUMA topology information", > }, > + { "sharing", > + &main_sharing, 0, > + "Get information about page sharing", > + "[Domain]", > + }, > { "sched-credit", > &main_sched_credit, 0, > "Get/set credit scheduler parameters",
Tim Deegan
2012-Jan-19 12:18 UTC
Re: [PATCH 04 of 12] x86/mm: Enforce lock ordering for sharing page locks
At 21:56 -0500 on 15 Jan (1326664584), Andres Lagar-Cavilla wrote:> xen/arch/x86/mm/mem_sharing.c | 91 ++++++++++++++++++++++++++---------------- > xen/arch/x86/mm/mm-locks.h | 18 ++++++++- > xen/include/asm-x86/mm.h | 3 +- > 3 files changed, 76 insertions(+), 36 deletions(-) > > > Use the ordering constructs in mm-locks.h to enforce an order > for the p2m and page locks in the sharing code. Applies to either > the global sharing lock (in audit mode) or the per page locks.Good idea, though there''a rather a lot of passing stack pointers around now. Given that you should only ever be taking one of these paths at a time, and you should never be waitq''d with a lock hels, maybe you could just have a static per-pcpu pg_lock_data struct? Cheers, Tim.
Tim Deegan
2012-Jan-19 12:53 UTC
Re: [PATCH 07 of 12] Add the ability to poll stats about shared memory via the console
At 21:56 -0500 on 15 Jan (1326664587), Andres Lagar-Cavilla wrote:> xen/arch/ia64/xen/mm.c | 5 +++++ > xen/arch/x86/mm.c | 14 ++++++++++++++ > xen/common/keyhandler.c | 7 +++++-- > xen/include/xen/mm.h | 3 +++ > 4 files changed, 27 insertions(+), 2 deletions(-) > > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Signed-off-by: Adin Scannell <adin@scannell.ca>Acked-by: Tim Deegan <tim@xen.org>
Tim Deegan
2012-Jan-19 12:59 UTC
Re: [PATCH 08 of 12] x86/mm: use RCU in mem sharing audit list, eliminate global lock completely
At 21:56 -0500 on 15 Jan (1326664588), Andres Lagar-Cavilla wrote:> xen/arch/x86/mm/mem_sharing.c | 94 +++++++++++++++++--------------------- > xen/arch/x86/mm/mm-locks.h | 18 ------- > xen/include/asm-x86/mem_sharing.h | 1 + > 3 files changed, 43 insertions(+), 70 deletions(-) > > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>Acked-by: Tim Deegan <tim@xen.org> I suspect you''ll get errors because this> @@ -215,11 +215,13 @@ static int mem_sharing_audit(void) > unsigned long count_expected; > unsigned long count_found = 0; > struct list_head *ae; > + DECLARE_PG_LOCK_DATA(pld); > > - ASSERT(shr_locked_by_me()); > count_expected = atomic_read(&nr_shared_mfns); >is no longer protected by a lock, but since they''re not fatal, that''s OK. Tim.
Tim Deegan
2012-Jan-19 13:02 UTC
Re: [PATCH 03 of 12] x86/mm: Add per-page locking for memory sharing, when audits are disabled
At 21:56 -0500 on 15 Jan (1326664583), Andres Lagar-Cavilla wrote:> @@ -510,26 +684,63 @@ int mem_sharing_share_pages(struct domai > smfn = get_gfn(sd, sgfn, &smfn_type); > cmfn = get_gfn(cd, cgfn, &cmfn_type); > > - ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; > - spage = mem_sharing_lookup(mfn_x(smfn)); > - if ( spage == NULL ) > + /* This tricky business is to avoid two callers deadlocking if > + * grabbing pages in opposite client/source order */I think you need to delete the XXX comment just above. :) Tim.
Andres Lagar-Cavilla
2012-Jan-19 13:03 UTC
Re: [PATCH 08 of 12] x86/mm: use RCU in mem sharing audit list, eliminate global lock completely
> At 21:56 -0500 on 15 Jan (1326664588), Andres Lagar-Cavilla wrote: >> xen/arch/x86/mm/mem_sharing.c | 94 >> +++++++++++++++++--------------------- >> xen/arch/x86/mm/mm-locks.h | 18 ------- >> xen/include/asm-x86/mem_sharing.h | 1 + >> 3 files changed, 43 insertions(+), 70 deletions(-) >> >> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > Acked-by: Tim Deegan <tim@xen.org> > > I suspect you''ll get errors because this > >> @@ -215,11 +215,13 @@ static int mem_sharing_audit(void) >> unsigned long count_expected; >> unsigned long count_found = 0; >> struct list_head *ae; >> + DECLARE_PG_LOCK_DATA(pld); >> >> - ASSERT(shr_locked_by_me()); >> count_expected = atomic_read(&nr_shared_mfns); >> > > is no longer protected by a lock, but since they''re not fatal, that''s > OK.Absolutely. We can live with those, though, and the intent is mostly to stare at that console and see if any of the more horrible conditions pop up. Once you''re done with the series I''ll try to address all/most of your concerns and resend a batch update. Hopefully by 11am EST. Thanks, Andres> > Tim. > >
Tim Deegan
2012-Jan-19 13:04 UTC
Re: [PATCH 06 of 12] x86/mm: New domctl: add a shared page to the physmap
Hi, At 21:56 -0500 on 15 Jan (1326664586), Andres Lagar-Cavilla wrote:> +int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh, > + struct domain *cd, unsigned long cgfn) > +{ > + struct page_info *spage; > + int ret = -EINVAL; > + mfn_t smfn, cmfn; > + p2m_type_t smfn_type, cmfn_type; > + struct gfn_info *gfn_info; > + struct p2m_domain *p2m = p2m_get_hostp2m(cd); > + p2m_access_t a; > + DECLARE_PG_LOCK_DATA(pld); > + > + /* XXX if sd == cd handle potential deadlock by ordering > + * the get_ and put_gfn''s */You fixed this in the other case; please fix this one too. :) Otherwise this looks OK. Cheers, Tim.
Andres Lagar-Cavilla
2012-Jan-19 13:06 UTC
Re: [PATCH 03 of 12] x86/mm: Add per-page locking for memory sharing, when audits are disabled
> At 21:56 -0500 on 15 Jan (1326664583), Andres Lagar-Cavilla wrote: >> @@ -510,26 +684,63 @@ int mem_sharing_share_pages(struct domai >> smfn = get_gfn(sd, sgfn, &smfn_type); >> cmfn = get_gfn(cd, cgfn, &cmfn_type); >> >> - ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; >> - spage = mem_sharing_lookup(mfn_x(smfn)); >> - if ( spage == NULL ) >> + /* This tricky business is to avoid two callers deadlocking if >> + * grabbing pages in opposite client/source order */ > > I think you need to delete the XXX comment just above. :) >Well, that XXX comment applies if get_gfn locks the p2m, and then what we''re grabbing in mem_sharing_lookup are lock per mfn, essentially. To prevent deadlock, we will want to sort those p2m locks by ascending domain, and then by ascending gfn within the same domain, if and when the p2m lock is broken into fine-grained bits. That is surely a battle for another day, I hope! Andres> Tim. > >
Andres Lagar-Cavilla
2012-Jan-19 13:09 UTC
Re: [PATCH 06 of 12] x86/mm: New domctl: add a shared page to the physmap
> Hi, > > At 21:56 -0500 on 15 Jan (1326664586), Andres Lagar-Cavilla wrote: >> +int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, >> shr_handle_t sh, >> + struct domain *cd, unsigned long cgfn) >> +{ >> + struct page_info *spage; >> + int ret = -EINVAL; >> + mfn_t smfn, cmfn; >> + p2m_type_t smfn_type, cmfn_type; >> + struct gfn_info *gfn_info; >> + struct p2m_domain *p2m = p2m_get_hostp2m(cd); >> + p2m_access_t a; >> + DECLARE_PG_LOCK_DATA(pld); >> + >> + /* XXX if sd == cd handle potential deadlock by ordering >> + * the get_ and put_gfn''s */ > > You fixed this in the other case; please fix this one too. :)Maybe this stems from the same confusion as the previous one. We''d still risk deadlocks on get_gfn. But the add to physmap call only needs to lock a single mfn. Andres> > Otherwise this looks OK. > > Cheers, > > Tim. > >
Andres Lagar-Cavilla
2012-Jan-19 13:12 UTC
Re: [PATCH 01 of 12] x86/mm: Eliminate hash table in sharing code as index of shared mfns
> Hi, > > This seems basically fine; it''s a logical progression from the interface > changes. A few comments inline below. > > One other thing I noticed, which is independent of this patch, really: > I''m not sure that having a domain ID in the gfn_info is the right > thing to do. It seems like we could just carry a domain pointer.I was wary that accessing the pointer directly might be unsafe if the domain is gone, and it was better to search the domain on a list when we needed to. Other stuff will fix Andres> > > At 21:56 -0500 on 15 Jan (1326664581), Andres Lagar-Cavilla wrote: >> The hashtable mechanism used by the sharing code is a bit odd. It seems >> unnecessary and adds complexity. > > I think that''s a bit harsh, since you removed its main use yourself! :) > >> Instead, we replace this by storing a list >> head directly in the page info for the case when the page is shared. >> This does >> not add any extra space to the page_info and serves to remove >> significant >> complexity from sharing. >> >> Signed-off-by: Adin Scannell <adin@scannell.ca> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >> >> diff -r 302a58874eb1 -r 2a4112172e44 xen/arch/x86/mm/mem_sharing.c >> --- a/xen/arch/x86/mm/mem_sharing.c >> +++ b/xen/arch/x86/mm/mem_sharing.c >> @@ -3,6 +3,7 @@ >> * >> * Memory sharing support. >> * >> + * Copyright (c) 2011 GridCentric, Inc. (Adin Scannell & Andres >> Lagar-Cavilla) >> * Copyright (c) 2009 Citrix Systems, Inc. (Grzegorz Milos) >> * >> * This program is free software; you can redistribute it and/or modify >> @@ -34,15 +35,30 @@ >> >> #include "mm-locks.h" >> >> -/* Auditing of memory sharing code? */ >> -#define MEM_SHARING_AUDIT 0 >> - >> #if MEM_SHARING_AUDIT >> -static void mem_sharing_audit(void); >> +static int mem_sharing_audit(void); > > Hmmm. You haven''t made any of the callers use this new return value. :( > Better to leave it as it was. > >> struct page_info >> { >> union { >> @@ -49,8 +51,13 @@ struct page_info >> /* For non-pinnable single-page shadows, a higher entry that >> points >> * at us. */ >> paddr_t up; >> - /* For shared/sharable pages the sharing handle */ >> - uint64_t shr_handle; >> + /* For shared/sharable pages, we use a doubly-linked list >> + * of all the {pfn,domain} pairs that map this page. We also >> include >> + * an opaque handle, which is effectively a version, so that >> clients >> + * of sharing share the version they expect to. >> + * This list is allocated and freed when a page is >> shared/unshared. >> + */ >> + struct page_sharing_info *shared_info; > > Can you call this field ''sharing'' instead? ''shared-info'' makes me > think of the per-domain shared_info page, and tripped me up a few times > reading this patch. :) > > Cheers, > > Tim. > >
Tim Deegan
2012-Jan-19 13:13 UTC
Re: [PATCH 03 of 12] x86/mm: Add per-page locking for memory sharing, when audits are disabled
At 05:06 -0800 on 19 Jan (1326949574), Andres Lagar-Cavilla wrote:> > At 21:56 -0500 on 15 Jan (1326664583), Andres Lagar-Cavilla wrote: > >> @@ -510,26 +684,63 @@ int mem_sharing_share_pages(struct domai > >> smfn = get_gfn(sd, sgfn, &smfn_type); > >> cmfn = get_gfn(cd, cgfn, &cmfn_type); > >> > >> - ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; > >> - spage = mem_sharing_lookup(mfn_x(smfn)); > >> - if ( spage == NULL ) > >> + /* This tricky business is to avoid two callers deadlocking if > >> + * grabbing pages in opposite client/source order */ > > > > I think you need to delete the XXX comment just above. :) > > > Well, that XXX comment applies if get_gfn locks the p2m, and then what > we''re grabbing in mem_sharing_lookup are lock per mfn, essentially. ToOh, I see - wrong deadlock!. :) Tim.
Tim Deegan
2012-Jan-19 13:14 UTC
Re: [PATCH 08 of 12] x86/mm: use RCU in mem sharing audit list, eliminate global lock completely
At 05:03 -0800 on 19 Jan (1326949413), Andres Lagar-Cavilla wrote:> > At 21:56 -0500 on 15 Jan (1326664588), Andres Lagar-Cavilla wrote: > >> xen/arch/x86/mm/mem_sharing.c | 94 > >> +++++++++++++++++--------------------- > >> xen/arch/x86/mm/mm-locks.h | 18 ------- > >> xen/include/asm-x86/mem_sharing.h | 1 + > >> 3 files changed, 43 insertions(+), 70 deletions(-) > >> > >> > >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > > > Acked-by: Tim Deegan <tim@xen.org> > > > > I suspect you''ll get errors because this > > > >> @@ -215,11 +215,13 @@ static int mem_sharing_audit(void) > >> unsigned long count_expected; > >> unsigned long count_found = 0; > >> struct list_head *ae; > >> + DECLARE_PG_LOCK_DATA(pld); > >> > >> - ASSERT(shr_locked_by_me()); > >> count_expected = atomic_read(&nr_shared_mfns); > >> > > > > is no longer protected by a lock, but since they''re not fatal, that''s > > OK. > Absolutely. We can live with those, though, and the intent is mostly to > stare at that console and see if any of the more horrible conditions pop > up. > > Once you''re done with the series I''ll try to address all/most of your > concerns and resend a batch update. Hopefully by 11am EST.I''ve said all I have to say for now; I''m unlikely to have time for a second pass at it today, though. :) Tim.
On Mon, 2012-01-16 at 02:56 +0000, Andres Lagar-Cavilla wrote:> tools/blktap2/drivers/tapdisk-vbd.c | 6 +- > tools/blktap2/drivers/tapdisk.h | 2 +- > tools/libxc/xc_memshr.c | 63 ++++++++++++++++++++++++++++++++++-- > tools/libxc/xenctrl.h | 19 ++++++++++- > tools/memshr/bidir-daemon.c | 34 ++++++++++++++----- > tools/memshr/bidir-hash.h | 12 ++++-- > tools/memshr/interface.c | 30 ++++++++++------- > tools/memshr/memshr.h | 11 +++++- > 8 files changed, 139 insertions(+), 38 deletions(-) > > > This patch is the folded version of API updates, along with the associated tool > changes to ensure that the build is always consistent. > > API updates: > - The source domain in the sharing calls is no longer assumed to be dom0. > - Previously, the mem sharing code would return an opaque handle to index > shared pages (and nominees) in its global hash table. By removing the hash > table, the handle becomes a version, to avoid sharing a stale version of a > page. Thus, libxc wrappers and tools need to be updated to recall the share > functions with the information needed to fetch the page (which they readily > have). > > Tool updates: > The only (in-tree, that we know of) consumer of the mem sharing API is the > memshr tool. This is updated to use the new API. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Signed-off-by: Adin Scannell <adin@scannell.ca>Acked-by: Ian Campbell <ian.campbell@citrix.com> at least as far as the libxc bits go. I don''t know much about the memshr/blktap side of things but the changes look sane to me.