This patch series proposes an overhaul of the memory sharing code. Aside from bug fixes and cleanups, the main features are: - Polling of stats via libxc, libxl and console - Removal of global sharing hashtable and global sharing lock (if audit disabled) - Turned sharing audits into a domctl - New domctl to populate vacant physmap entries with shared pages. As a result, the domctl interface to sharing changes. The only in-tree consumer of this interface is updated in the current series. It is important that if any out-of-tree consumer exists, that they state their opinion on this interface change. Patches 5 to 8, 10, 11, 15 and 18 are tools patches. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scannell <adin@scannell.ca> xen/arch/x86/mm.c | 6 +- xen/arch/x86/mm/mem_sharing.c | 91 +++-- xen/arch/x86/mm.c | 2 +- xen/arch/x86/mm/mem_sharing.c | 526 +++++++++++++++++---------------- xen/include/asm-x86/mem_sharing.h | 15 +- xen/include/asm-x86/mm.h | 11 +- xen/include/public/domctl.h | 3 + xen/arch/x86/mm/mem_sharing.c | 65 +++- xen/include/public/domctl.h | 9 + tools/libxc/xc_memshr.c | 3 +- tools/libxc/xenctrl.h | 1 + tools/libxc/xc_memshr.c | 19 +- tools/libxc/xenctrl.h | 7 +- tools/blktap2/drivers/Makefile | 2 +- tools/blktap2/drivers/tapdisk-image.c | 2 +- tools/blktap2/drivers/tapdisk-vbd.c | 6 +- tools/blktap2/drivers/tapdisk.h | 6 +- tools/memshr/bidir-daemon.c | 4 + tools/memshr/bidir-hash.h | 13 +- tools/memshr/interface.c | 31 +- tools/memshr/memshr.h | 11 +- tools/libxl/xl.h | 1 + tools/libxl/xl_cmdimpl.c | 85 +++++ tools/libxl/xl_cmdtable.c | 6 + xen/arch/x86/mm.c | 6 - xen/arch/x86/mm/mem_sharing.c | 31 +- xen/arch/x86/x86_64/compat/mm.c | 6 + xen/arch/x86/x86_64/mm.c | 7 + xen/include/asm-x86/mem_sharing.h | 1 + xen/include/public/memory.h | 1 + tools/libxc/xc_private.c | 10 + tools/libxc/xenctrl.h | 4 + tools/libxl/Makefile | 2 +- tools/libxl/xl_cmdimpl.c | 13 +- xen/arch/x86/mm.c | 4 +- xen/include/asm-x86/mm.h | 3 + xen/arch/x86/mm.c | 16 +- xen/arch/x86/mm/mem_sharing.c | 169 +++++++++- xen/arch/x86/mm/mm-locks.h | 6 +- xen/include/asm-x86/mm.h | 2 +- xen/arch/x86/mm/mem_sharing.c | 106 ++++++ xen/include/public/domctl.h | 3 +- tools/libxc/xc_memshr.c | 23 + tools/libxc/xenctrl.h | 6 + xen/arch/ia64/xen/mm.c | 6 + xen/arch/x86/mm/mem_sharing.c | 8 + xen/common/keyhandler.c | 7 +- xen/include/xen/mm.h | 3 + xen/arch/x86/mm/mem_sharing.c | 17 +- xen/include/public/domctl.h | 1 + tools/libxc/xc_memshr.c | 14 + tools/libxc/xenctrl.h | 2 + 52 files changed, 1005 insertions(+), 397 deletions(-) ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com ______________________________________________________________________
Andres Lagar-Cavilla
2011-Dec-08 07:47 UTC
[PATCH 01 of 18] x86/mm: Code style fixes in mem_sharing.c
xen/arch/x86/mm.c | 6 +- xen/arch/x86/mm/mem_sharing.c | 91 ++++++++++++++++++++++-------------------- 2 files changed, 50 insertions(+), 47 deletions(-) Harmless patch, with no functional changes, only code style issues. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 62c342fd7b9c -r aeebbff899ff xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4321,9 +4321,9 @@ int page_make_sharable(struct domain *d, return -EEXIST; } - /* Check if the ref count is 2. The first from PGT_allocated, and + /* Check if the ref count is 2. The first from PGC_allocated, and * the second from get_page_and_type at the top of this function */ - if(page->count_info != (PGC_allocated | (2 + expected_refcnt))) + if ( page->count_info != (PGC_allocated | (2 + expected_refcnt)) ) { /* Return type count back to zero */ put_page_and_type(page); @@ -4340,7 +4340,7 @@ int page_make_sharable(struct domain *d, int page_make_private(struct domain *d, struct page_info *page) { - if(!get_page(page, dom_cow)) + if ( !get_page(page, dom_cow) ) return -EINVAL; spin_lock(&d->page_alloc_lock); diff -r 62c342fd7b9c -r aeebbff899ff xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -35,14 +35,14 @@ #include "mm-locks.h" /* Auditing of memory sharing code? */ -#define MEM_SHARING_AUDIT 0 +#define MEM_SHARING_AUDIT 0 #if MEM_SHARING_AUDIT static void mem_sharing_audit(void); #define MEM_SHARING_DEBUG(_f, _a...) \ debugtrace_printk("mem_sharing_debug: %s(): " _f, __func__, ##_a) #else -# define mem_sharing_audit() do {} while(0) +#define mem_sharing_audit() do {} while(0) #endif /* MEM_SHARING_AUDIT */ #define mem_sharing_enabled(d) \ @@ -56,7 +56,7 @@ static void mem_sharing_audit(void); #define page_to_mfn(_pg) _mfn(__page_to_mfn(_pg)) static shr_handle_t next_handle = 1; -static atomic_t nr_saved_mfns = ATOMIC_INIT(0); +static atomic_t nr_saved_mfns = ATOMIC_INIT(0); typedef struct shr_hash_entry { @@ -84,9 +84,9 @@ static inline int list_has_one_entry(str return (head->next != head) && (head->next->next == head); } -static inline struct gfn_info* gfn_get_info(struct list_head *list) +static inline gfn_info_t *gfn_get_info(struct list_head *list) { - return list_entry(list->next, struct gfn_info, list); + return list_entry(list->next, gfn_info_t, list); } static void __init mem_sharing_hash_init(void) @@ -116,12 +116,12 @@ static gfn_info_t *mem_sharing_gfn_alloc static void mem_sharing_gfn_destroy(gfn_info_t *gfn, int was_shared) { /* Decrement the number of pages, if the gfn was shared before */ - if(was_shared) + if ( was_shared ) { struct domain *d = get_domain_by_id(gfn->domain); - /* Domain may have been destroyed by now, if we are called from - * p2m_teardown */ - if(d) + /* Domain may have been destroyed by now * + * (if we are called from p2m_teardown) */ + if ( d ) { atomic_dec(&d->shr_pages); put_domain(d); @@ -261,7 +261,8 @@ static struct page_info* mem_sharing_all mem_event_request_t req; page = alloc_domheap_page(d, 0); - if(page != NULL) return page; + if ( page != NULL ) + return page; memset(&req, 0, sizeof(req)); req.type = MEM_EVENT_TYPE_SHARED; @@ -301,7 +302,7 @@ static struct page_info* mem_sharing_all unsigned int mem_sharing_get_nr_saved_mfns(void) { - return (unsigned int)atomic_read(&nr_saved_mfns); + return ((unsigned int)atomic_read(&nr_saved_mfns)); } int mem_sharing_sharing_resume(struct domain *d) @@ -328,14 +329,15 @@ int mem_sharing_debug_mfn(unsigned long { struct page_info *page; - if(!mfn_valid(_mfn(mfn))) + if ( !mfn_valid(_mfn(mfn)) ) { - printk("Invalid MFN=%lx\n", mfn); + gdprintk(XENLOG_ERR, "Invalid MFN=%lx\n", mfn); return -1; } page = mfn_to_page(_mfn(mfn)); - printk("Debug page: MFN=%lx is ci=%lx, ti=%lx, owner_id=%d\n", + gdprintk(XENLOG_DEBUG, + "Debug page: MFN=%lx is ci=%lx, ti=%lx, owner_id=%d\n", mfn_x(page_to_mfn(page)), page->count_info, page->u.inuse.type_info, @@ -351,9 +353,9 @@ int mem_sharing_debug_gfn(struct domain mfn = get_gfn_unlocked(d, gfn, &p2mt); - printk("Debug for domain=%d, gfn=%lx, ", - d->domain_id, - gfn); + gdprintk(XENLOG_DEBUG, "Debug for domain=%d, gfn=%lx, ", + d->domain_id, + gfn); return mem_sharing_debug_mfn(mfn_x(mfn)); } @@ -370,8 +372,8 @@ int mem_sharing_debug_gfn(struct domain static grant_entry_header_t * shared_entry_header(struct grant_table *t, grant_ref_t ref) { - ASSERT(t->gt_version != 0); - if (t->gt_version == 1) + ASSERT (t->gt_version != 0); + if ( t->gt_version == 1 ) return (grant_entry_header_t*)&shared_entry_v1(t, ref); else return &shared_entry_v2(t, ref).hdr; @@ -381,25 +383,23 @@ static int mem_sharing_gref_to_gfn(struc grant_ref_t ref, unsigned long *gfn) { - if(d->grant_table->gt_version < 1) + if ( d->grant_table->gt_version < 1 ) return -1; - if (d->grant_table->gt_version == 1) + if ( d->grant_table->gt_version == 1 ) { grant_entry_v1_t *sha1; sha1 = &shared_entry_v1(d->grant_table, ref); *gfn = sha1->frame; - return 0; } else { grant_entry_v2_t *sha2; sha2 = &shared_entry_v2(d->grant_table, ref); *gfn = sha2->full_page.frame; - return 0; } - return -2; + return 0; } /* Account for a GFN being shared/unshared. @@ -439,20 +439,22 @@ int mem_sharing_debug_gref(struct domain uint16_t status; unsigned long gfn; - if(d->grant_table->gt_version < 1) + if ( d->grant_table->gt_version < 1 ) { - printk("Asked to debug [dom=%d,gref=%d], but not yet inited.\n", + gdprintk(XENLOG_ERR, + "Asked to debug [dom=%d,gref=%d], but not yet inited.\n", d->domain_id, ref); return -1; } - mem_sharing_gref_to_gfn(d, ref, &gfn); + (void)mem_sharing_gref_to_gfn(d, ref, &gfn); shah = shared_entry_header(d->grant_table, ref); - if (d->grant_table->gt_version == 1) + if ( d->grant_table->gt_version == 1 ) status = shah->flags; else status = status_entry(d->grant_table, ref); - printk("==> Grant [dom=%d,ref=%d], status=%x. ", + gdprintk(XENLOG_DEBUG, + "==> Grant [dom=%d,ref=%d], status=%x. ", d->domain_id, ref, status); return mem_sharing_debug_gfn(d, gfn); @@ -478,24 +480,24 @@ int mem_sharing_nominate_page(struct dom /* Check if mfn is valid */ ret = -EINVAL; - if (!mfn_valid(mfn)) + if ( !mfn_valid(mfn) ) goto out; /* Return the handle if the page is already shared */ page = mfn_to_page(mfn); - if (p2m_is_shared(p2mt)) { + if ( p2m_is_shared(p2mt) ) { *phandle = page->shr_handle; ret = 0; goto out; } /* Check p2m type */ - if (!p2m_is_sharable(p2mt)) + if ( !p2m_is_sharable(p2mt) ) goto out; /* Try to convert the mfn to the sharable type */ ret = page_make_sharable(d, page, expected_refcnt); - if(ret) + if ( ret ) goto out; /* Create the handle */ @@ -512,7 +514,7 @@ int mem_sharing_nominate_page(struct dom } /* Change the p2m type */ - if(p2m_change_type(d, gfn, p2mt, p2m_ram_shared) != p2mt) + if ( p2m_change_type(d, gfn, p2mt, p2m_ram_shared) != p2mt ) { /* This is unlikely, as the type must have changed since we''ve checked * it a few lines above. @@ -618,7 +620,7 @@ int mem_sharing_unshare_page(struct doma mfn = get_gfn(d, gfn, &p2mt); /* Has someone already unshared it? */ - if (!p2m_is_shared(p2mt)) { + if ( !p2m_is_shared(p2mt) ) { put_gfn(d, gfn); shr_unlock(); return 0; @@ -631,10 +633,11 @@ int mem_sharing_unshare_page(struct doma list_for_each(le, &hash_entry->gfns) { gfn_info = list_entry(le, struct gfn_info, list); - if((gfn_info->gfn == gfn) && (gfn_info->domain == d->domain_id)) + if ( (gfn_info->gfn == gfn) && (gfn_info->domain == d->domain_id) ) goto gfn_found; } - printk("Could not find gfn_info for shared gfn: %lx\n", gfn); + gdprintk(XENLOG_ERR, "Could not find gfn_info for shared gfn: " + "%lx\n", gfn); BUG(); gfn_found: /* Delete gfn_info from the list, but hold on to it, until we''ve allocated @@ -644,7 +647,7 @@ gfn_found: /* If the GFN is getting destroyed drop the references to MFN * (possibly freeing the page), and exit early */ - if(flags & MEM_SHARING_DESTROY_GFN) + if ( flags & MEM_SHARING_DESTROY_GFN ) { mem_sharing_gfn_destroy(gfn_info, !last_gfn); if(last_gfn) @@ -702,8 +705,8 @@ private_page_found: else atomic_dec(&nr_saved_mfns); - if(p2m_change_type(d, gfn, p2m_ram_shared, p2m_ram_rw) != - p2m_ram_shared) + if ( p2m_change_type(d, gfn, p2m_ram_shared, p2m_ram_rw) != + p2m_ram_shared ) { printk("Could not change p2m type.\n"); BUG(); @@ -740,7 +743,7 @@ int mem_sharing_domctl(struct domain *d, { unsigned long gfn = mec->u.nominate.u.gfn; shr_handle_t handle; - if(!mem_sharing_enabled(d)) + if ( !mem_sharing_enabled(d) ) return -EINVAL; rc = mem_sharing_nominate_page(d, gfn, 0, &handle); mec->u.nominate.handle = handle; @@ -753,9 +756,9 @@ int mem_sharing_domctl(struct domain *d, unsigned long gfn; shr_handle_t handle; - if(!mem_sharing_enabled(d)) + if ( !mem_sharing_enabled(d) ) return -EINVAL; - if(mem_sharing_gref_to_gfn(d, gref, &gfn) < 0) + if ( mem_sharing_gref_to_gfn(d, gref, &gfn) < 0 ) return -EINVAL; rc = mem_sharing_nominate_page(d, gfn, 3, &handle); mec->u.nominate.handle = handle; @@ -772,7 +775,7 @@ int mem_sharing_domctl(struct domain *d, case XEN_DOMCTL_MEM_EVENT_OP_SHARING_RESUME: { - if(!mem_sharing_enabled(d)) + if ( !mem_sharing_enabled(d) ) return -EINVAL; rc = mem_sharing_sharing_resume(d); }
Andres Lagar-Cavilla
2011-Dec-08 07:47 UTC
[PATCH 02 of 18] x86/mm: Making a page sharable sets PGT_validated, but making a page private doesn''t expect it
xen/arch/x86/mm.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r aeebbff899ff -r 61da3fc46f06 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4347,7 +4347,7 @@ int page_make_private(struct domain *d, /* We can only change the type if count is one */ if ( (page->u.inuse.type_info & (PGT_type_mask | PGT_count_mask)) - != (PGT_shared_page | 1) ) + != (PGT_shared_page | PGT_validated | 1) ) { put_page(page); spin_unlock(&d->page_alloc_lock);
Andres Lagar-Cavilla
2011-Dec-08 07:47 UTC
[PATCH 03 of 18] x86/mm: Eliminate hash table in sharing code as index of shared mfns
xen/arch/x86/mm/mem_sharing.c | 526 +++++++++++++++++++------------------ xen/include/asm-x86/mem_sharing.h | 15 +- xen/include/asm-x86/mm.h | 11 +- xen/include/public/domctl.h | 3 + 4 files changed, 296 insertions(+), 259 deletions(-) The hashtable mechanism used by the sharing code is a bit odd. It seems unnecessary and adds complexity. Instead, we replace this by storing a list head directly in the page info for the case when the page is shared. This does not add any extra space to the page_info and serves to remove significant complexity from sharing. Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 61da3fc46f06 -r 3038770886aa xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -3,6 +3,7 @@ * * Memory sharing support. * + * Copyright (c) 2011 GridCentric, Inc. (Adin Scannell & Andres Lagar-Cavilla) * Copyright (c) 2009 Citrix Systems, Inc. (Grzegorz Milos) * * This program is free software; you can redistribute it and/or modify @@ -38,11 +39,28 @@ #define MEM_SHARING_AUDIT 0 #if MEM_SHARING_AUDIT -static void mem_sharing_audit(void); +static int mem_sharing_audit(void); #define MEM_SHARING_DEBUG(_f, _a...) \ debugtrace_printk("mem_sharing_debug: %s(): " _f, __func__, ##_a) +static struct list_head shr_audit_list; + +static inline void audit_add_list(struct page_info *page) +{ + list_add(&page->shared_info->entry, &shr_audit_list); +} + +static inline void audit_del_list(struct page_info *page) +{ + list_del(&page->shared_info->entry); +} #else -#define mem_sharing_audit() do {} while(0) +static inline int mem_sharing_audit(void) +{ + return 0; +} + +#define audit_add_list(p) ((void)0) +#define audit_del_list(p) ((void)0) #endif /* MEM_SHARING_AUDIT */ #define mem_sharing_enabled(d) \ @@ -58,17 +76,6 @@ static void mem_sharing_audit(void); static shr_handle_t next_handle = 1; static atomic_t nr_saved_mfns = ATOMIC_INIT(0); -typedef struct shr_hash_entry -{ - shr_handle_t handle; - mfn_t mfn; - struct shr_hash_entry *next; - struct list_head gfns; -} shr_hash_entry_t; - -#define SHR_HASH_LENGTH 1000 -static shr_hash_entry_t *shr_hash[SHR_HASH_LENGTH]; - typedef struct gfn_info { unsigned long gfn; @@ -89,36 +96,30 @@ static inline gfn_info_t *gfn_get_info(s return list_entry(list->next, gfn_info_t, list); } -static void __init mem_sharing_hash_init(void) +static inline gfn_info_t *mem_sharing_gfn_alloc(struct page_info *page, + struct domain *d, + unsigned long gfn) { - int i; + gfn_info_t *gfn_info = xmalloc(gfn_info_t); - mm_lock_init(&shr_lock); - for(i=0; i<SHR_HASH_LENGTH; i++) - shr_hash[i] = NULL; + if ( gfn_info == NULL ) + return NULL; + + gfn_info->gfn = gfn; + gfn_info->domain = d->domain_id; + INIT_LIST_HEAD(&gfn_info->list); + list_add(&gfn_info->list, &page->shared_info->gfns); + + return gfn_info; } -static shr_hash_entry_t *mem_sharing_hash_alloc(void) -{ - return xmalloc(shr_hash_entry_t); -} - -static void mem_sharing_hash_destroy(shr_hash_entry_t *e) -{ - xfree(e); -} - -static gfn_info_t *mem_sharing_gfn_alloc(void) -{ - return xmalloc(gfn_info_t); -} - -static void mem_sharing_gfn_destroy(gfn_info_t *gfn, int was_shared) +static inline void mem_sharing_gfn_destroy(gfn_info_t *gfn_info, + int was_shared) { /* Decrement the number of pages, if the gfn was shared before */ if ( was_shared ) { - struct domain *d = get_domain_by_id(gfn->domain); + struct domain *d = get_domain_by_id(gfn_info->domain); /* Domain may have been destroyed by now * * (if we are called from p2m_teardown) */ if ( d ) @@ -127,128 +128,118 @@ static void mem_sharing_gfn_destroy(gfn_ put_domain(d); } } - xfree(gfn); + + list_del(&gfn_info->list); + xfree(gfn_info); } -static shr_hash_entry_t* mem_sharing_hash_lookup(shr_handle_t handle) +static struct page_info* mem_sharing_lookup(unsigned long mfn) { - shr_hash_entry_t *e; - - e = shr_hash[handle % SHR_HASH_LENGTH]; - while(e != NULL) + if ( mfn_valid(_mfn(mfn)) ) { - if(e->handle == handle) - return e; - e = e->next; + struct page_info* page = mfn_to_page(_mfn(mfn)); + if ( page_get_owner(page) == dom_cow ) + { + ASSERT(page->u.inuse.type_info & PGT_type_mask); + ASSERT(get_gpfn_from_mfn(mfn) == SHARED_M2P_ENTRY); + return page; + } } return NULL; } -static shr_hash_entry_t* mem_sharing_hash_insert(shr_handle_t handle, mfn_t mfn) +#if MEM_SHARING_AUDIT +static int mem_sharing_audit(void) { - shr_hash_entry_t *e, **ee; - - e = mem_sharing_hash_alloc(); - if(e == NULL) return NULL; - e->handle = handle; - e->mfn = mfn; - ee = &shr_hash[handle % SHR_HASH_LENGTH]; - e->next = *ee; - *ee = e; - return e; -} - -static void mem_sharing_hash_delete(shr_handle_t handle) -{ - shr_hash_entry_t **pprev, *e; - - pprev = &shr_hash[handle % SHR_HASH_LENGTH]; - e = *pprev; - while(e != NULL) - { - if(e->handle == handle) - { - *pprev = e->next; - mem_sharing_hash_destroy(e); - return; - } - pprev = &e->next; - e = e->next; - } - printk("Could not find shr entry for handle %"PRIx64"\n", handle); - BUG(); -} - -#if MEM_SHARING_AUDIT -static void mem_sharing_audit(void) -{ - shr_hash_entry_t *e; - struct list_head *le; - gfn_info_t *g; - int bucket; - struct page_info *pg; + int errors = 0; + struct list_head *ae; ASSERT(shr_locked_by_me()); - for(bucket=0; bucket < SHR_HASH_LENGTH; bucket++) + list_for_each(ae, &shr_audit_list) { - e = shr_hash[bucket]; - /* Loop over all shr_hash_entries */ - while(e != NULL) + struct page_sharing_info *shared_info; + unsigned long nr_gfns = 0; + struct page_info *pg; + struct list_head *le; + mfn_t mfn; + + shared_info = list_entry(ae, struct page_sharing_info, entry); + pg = shared_info->pg; + mfn = page_to_mfn(pg); + + /* Check if the MFN has correct type, owner and handle. */ + if ( !(pg->u.inuse.type_info & PGT_shared_page) ) { - int nr_gfns=0; + MEM_SHARING_DEBUG("mfn %lx in audit list, but not PGT_shared_page (%d)!\n", + mfn_x(mfn), pg->u.inuse.type_info & PGT_type_mask); + errors++; + continue; + } - /* Check if the MFN has correct type, owner and handle */ - pg = mfn_to_page(e->mfn); - if((pg->u.inuse.type_info & PGT_type_mask) != PGT_shared_page) - MEM_SHARING_DEBUG("mfn %lx not shared, but in the hash!\n", - mfn_x(e->mfn)); - if(page_get_owner(pg) != dom_cow) - MEM_SHARING_DEBUG("mfn %lx shared, but wrong owner (%d)!\n", - mfn_x(e->mfn), - page_get_owner(pg)->domain_id); - if(e->handle != pg->shr_handle) - MEM_SHARING_DEBUG("mfn %lx shared, but wrong handle " - "(%ld != %ld)!\n", - mfn_x(e->mfn), pg->shr_handle, e->handle); - /* Check if all GFNs map to the MFN, and the p2m types */ - list_for_each(le, &e->gfns) + /* Check the page owner. */ + if ( page_get_owner(pg) != dom_cow ) + { + MEM_SHARING_DEBUG("mfn %lx shared, but wrong owner (%d)!\n", + mfn_x(mfn), page_get_owner(pg)->domain_id); + errors++; + } + + /* Check the m2p entry */ + if ( get_gpfn_from_mfn(mfn_x(mfn)) != SHARED_M2P_ENTRY ) + { + MEM_SHARING_DEBUG("mfn %lx shared, but wrong m2p entry (%lx)!\n", + mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn))); + errors++; + } + + /* Check if all GFNs map to the MFN, and the p2m types */ + list_for_each(le, &pg->shared_info->gfns) + { + struct domain *d; + p2m_type_t t; + mfn_t o_mfn; + gfn_info_t *g; + + g = list_entry(le, gfn_info_t, list); + d = get_domain_by_id(g->domain); + if ( d == NULL ) { - struct domain *d; - p2m_type_t t; - mfn_t mfn; - - g = list_entry(le, struct gfn_info, list); - d = get_domain_by_id(g->domain); - if(d == NULL) - { - MEM_SHARING_DEBUG("Unknow dom: %d, for PFN=%lx, MFN=%lx\n", - g->domain, g->gfn, mfn_x(e->mfn)); - continue; - } - mfn = get_gfn_unlocked(d, g->gfn, &t); - if(mfn_x(mfn) != mfn_x(e->mfn)) - MEM_SHARING_DEBUG("Incorrect P2M for d=%d, PFN=%lx." - "Expecting MFN=%ld, got %ld\n", - g->domain, g->gfn, mfn_x(e->mfn), - mfn_x(mfn)); - if(t != p2m_ram_shared) - MEM_SHARING_DEBUG("Incorrect P2M type for d=%d, PFN=%lx." - "Expecting t=%d, got %d\n", - g->domain, g->gfn, mfn_x(e->mfn), - p2m_ram_shared, t); - put_domain(d); - nr_gfns++; - } - if(nr_gfns != (pg->u.inuse.type_info & PGT_count_mask)) - MEM_SHARING_DEBUG("Mismatched counts for MFN=%lx." - "nr_gfns in hash %d, in type_info %d\n", - mfn_x(e->mfn), nr_gfns, - (pg->u.inuse.type_info & PGT_count_mask)); - e = e->next; + MEM_SHARING_DEBUG("Unknown dom: %d, for PFN=%lx, MFN=%lx\n", + g->domain, g->gfn, mfn); + errors++; + continue; + } + o_mfn = get_gfn_query_unlocked(d, g->gfn, &t); + if ( mfn_x(o_mfn) != mfn_x(mfn) ) + { + MEM_SHARING_DEBUG("Incorrect P2M for d=%d, PFN=%lx." + "Expecting MFN=%ld, got %ld\n", + g->domain, g->gfn, mfn, mfn_x(o_mfn)); + errors++; + } + if ( t != p2m_ram_shared ) + { + MEM_SHARING_DEBUG("Incorrect P2M type for d=%d, PFN=%lx." + "Expecting t=%d, got %d\n", + g->domain, g->gfn, mfn, p2m_ram_shared, t); + errors++; + } + put_domain(d); + nr_gfns++; + } + if ( nr_gfns != (pg->u.inuse.type_info & PGT_count_mask) ) + { + MEM_SHARING_DEBUG("Mismatched counts for MFN=%lx." + "nr_gfns in hash %d, in type_info %d\n", + mfn_x(mfn), nr_gfns, + (pg->u.inuse.type_info & PGT_count_mask)); + errors++; } } + + return errors; } #endif @@ -402,36 +393,6 @@ static int mem_sharing_gref_to_gfn(struc return 0; } -/* Account for a GFN being shared/unshared. - * When sharing this function needs to be called _before_ gfn lists are merged - * together, but _after_ gfn is removed from the list when unsharing. - */ -static int mem_sharing_gfn_account(struct gfn_info *gfn, int sharing) -{ - struct domain *d; - - /* A) When sharing: - * if the gfn being shared is in > 1 long list, its already been - * accounted for - * B) When unsharing: - * if the list is longer than > 1, we don''t have to account yet. - */ - if(list_has_one_entry(&gfn->list)) - { - d = get_domain_by_id(gfn->domain); - BUG_ON(!d); - if(sharing) - atomic_inc(&d->shr_pages); - else - atomic_dec(&d->shr_pages); - put_domain(d); - - return 1; - } - mem_sharing_audit(); - - return 0; -} int mem_sharing_debug_gref(struct domain *d, grant_ref_t ref) { @@ -469,8 +430,6 @@ int mem_sharing_nominate_page(struct dom mfn_t mfn; struct page_info *page; int ret; - shr_handle_t handle; - shr_hash_entry_t *hash_entry; struct gfn_info *gfn_info; *phandle = 0UL; @@ -486,7 +445,7 @@ int mem_sharing_nominate_page(struct dom /* Return the handle if the page is already shared */ page = mfn_to_page(mfn); if ( p2m_is_shared(p2mt) ) { - *phandle = page->shr_handle; + *phandle = page->shared_info->handle; ret = 0; goto out; } @@ -500,16 +459,27 @@ int mem_sharing_nominate_page(struct dom if ( ret ) goto out; - /* Create the handle */ + /* Initialize the shared state */ ret = -ENOMEM; - handle = next_handle++; - if((hash_entry = mem_sharing_hash_insert(handle, mfn)) == NULL) + if ( (page->shared_info = + xmalloc(struct page_sharing_info)) == NULL ) { + BUG_ON(page_make_private(d, page) != 0); goto out; } - if((gfn_info = mem_sharing_gfn_alloc()) == NULL) + page->shared_info->pg = page; + INIT_LIST_HEAD(&page->shared_info->gfns); + INIT_LIST_HEAD(&page->shared_info->entry); + + /* Create the handle */ + page->shared_info->handle = next_handle++; + + /* Create the local gfn info */ + if ( (gfn_info = mem_sharing_gfn_alloc(page, d, gfn)) == NULL ) { - mem_sharing_hash_destroy(hash_entry); + xfree(page->shared_info); + page->shared_info = NULL; + BUG_ON(page_make_private(d, page) != 0); goto out; } @@ -520,23 +490,19 @@ int mem_sharing_nominate_page(struct dom * it a few lines above. * The mfn needs to revert back to rw type. This should never fail, * since no-one knew that the mfn was temporarily sharable */ + mem_sharing_gfn_destroy(gfn_info, 0); + xfree(page->shared_info); + page->shared_info = NULL; + /* NOTE: We haven''t yet added this to the audit list. */ BUG_ON(page_make_private(d, page) != 0); - mem_sharing_hash_destroy(hash_entry); - mem_sharing_gfn_destroy(gfn_info, 0); goto out; } /* Update m2p entry to SHARED_M2P_ENTRY */ set_gpfn_from_mfn(mfn_x(mfn), SHARED_M2P_ENTRY); - INIT_LIST_HEAD(&hash_entry->gfns); - INIT_LIST_HEAD(&gfn_info->list); - list_add(&gfn_info->list, &hash_entry->gfns); - gfn_info->gfn = gfn; - gfn_info->domain = d->domain_id; - page->shr_handle = handle; - *phandle = handle; - + *phandle = page->shared_info->handle; + audit_add_list(page); ret = 0; out: @@ -545,54 +511,92 @@ out: return ret; } -int mem_sharing_share_pages(shr_handle_t sh, shr_handle_t ch) +int mem_sharing_share_pages(struct domain *sd, unsigned long sgfn, shr_handle_t sh, + struct domain *cd, unsigned long cgfn, shr_handle_t ch) { - shr_hash_entry_t *se, *ce; struct page_info *spage, *cpage; struct list_head *le, *te; - struct gfn_info *gfn; + gfn_info_t *gfn; struct domain *d; - int ret; + int ret = -EINVAL, single_client_gfn, single_source_gfn; + mfn_t smfn, cmfn; + p2m_type_t smfn_type, cmfn_type; shr_lock(); + /* XXX if sd == cd handle potential deadlock by ordering + * the get_ and put_gfn''s */ + smfn = get_gfn(sd, sgfn, &smfn_type); + cmfn = get_gfn(cd, cgfn, &cmfn_type); + ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; - se = mem_sharing_hash_lookup(sh); - if(se == NULL) goto err_out; + spage = mem_sharing_lookup(mfn_x(smfn)); + if ( spage == NULL ) + goto err_out; + ASSERT(smfn_type == p2m_ram_shared); ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID; - ce = mem_sharing_hash_lookup(ch); - if(ce == NULL) goto err_out; - spage = mfn_to_page(se->mfn); - cpage = mfn_to_page(ce->mfn); - /* gfn lists always have at least one entry => save to call list_entry */ - mem_sharing_gfn_account(gfn_get_info(&ce->gfns), 1); - mem_sharing_gfn_account(gfn_get_info(&se->gfns), 1); - list_for_each_safe(le, te, &ce->gfns) + cpage = mem_sharing_lookup(mfn_x(cmfn)); + if ( cpage == NULL ) + goto err_out; + ASSERT(cmfn_type == p2m_ram_shared); + + /* Check that the handles match */ + if ( spage->shared_info->handle != sh ) { - gfn = list_entry(le, struct gfn_info, list); - /* Get the source page and type, this should never fail - * because we are under shr lock, and got non-null se */ + ret = XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID; + goto err_out; + } + if ( cpage->shared_info->handle != ch ) + { + ret = XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID; + goto err_out; + } + + /* Merge the lists together */ + single_source_gfn = list_has_one_entry(&spage->shared_info->gfns); + single_client_gfn = list_has_one_entry(&cpage->shared_info->gfns); + list_for_each_safe(le, te, &cpage->shared_info->gfns) + { + gfn = list_entry(le, gfn_info_t, list); + /* Get the source page and type, this should never fail: + * we are under shr lock, and got a successful lookup */ BUG_ON(!get_page_and_type(spage, dom_cow, PGT_shared_page)); - /* Move the gfn_info from ce list to se list */ + /* Move the gfn_info from client list to source list */ list_del(&gfn->list); + list_add(&gfn->list, &spage->shared_info->gfns); + put_page_and_type(cpage); d = get_domain_by_id(gfn->domain); BUG_ON(!d); - BUG_ON(set_shared_p2m_entry(d, gfn->gfn, se->mfn) == 0); + BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn) == 0); + if ( single_client_gfn ) + { + /* Only increase the per-domain count when we are actually + * sharing. And don''t increase it should we ever re-share */ + atomic_inc(&d->shr_pages); + ASSERT( cd == d ); + } put_domain(d); - list_add(&gfn->list, &se->gfns); - put_page_and_type(cpage); - } - ASSERT(list_empty(&ce->gfns)); - mem_sharing_hash_delete(ch); - atomic_inc(&nr_saved_mfns); + } + ASSERT(list_empty(&cpage->shared_info->gfns)); + if ( single_source_gfn ) + atomic_inc(&sd->shr_pages); + + /* Clear the rest of the shared state */ + audit_del_list(cpage); + xfree(cpage->shared_info); + cpage->shared_info = NULL; + /* Free the client page */ if(test_and_clear_bit(_PGC_allocated, &cpage->count_info)) put_page(cpage); + + atomic_inc(&nr_saved_mfns); ret = 0; err_out: + put_gfn(cd, cgfn); + put_gfn(sd, sgfn); shr_unlock(); - return ret; } @@ -604,13 +608,9 @@ int mem_sharing_unshare_page(struct doma mfn_t mfn; struct page_info *page, *old_page; void *s, *t; - int ret, last_gfn; - shr_hash_entry_t *hash_entry; - struct gfn_info *gfn_info = NULL; - shr_handle_t handle; + int last_gfn; + gfn_info_t *gfn_info = NULL; struct list_head *le; - - /* Remove the gfn_info from the list */ /* This is one of the reasons why we can''t enforce ordering * between shr_lock and p2m fine-grained locks in mm-lock. @@ -626,56 +626,70 @@ int mem_sharing_unshare_page(struct doma return 0; } - page = mfn_to_page(mfn); - handle = page->shr_handle; - - hash_entry = mem_sharing_hash_lookup(handle); - list_for_each(le, &hash_entry->gfns) + page = mem_sharing_lookup(mfn_x(mfn)); + if ( page == NULL ) { - gfn_info = list_entry(le, struct gfn_info, list); + gdprintk(XENLOG_ERR, "Domain p2m is shared, but page is not: " + "%lx\n", gfn); + BUG(); + } + + list_for_each(le, &page->shared_info->gfns) + { + gfn_info = list_entry(le, gfn_info_t, list); if ( (gfn_info->gfn == gfn) && (gfn_info->domain == d->domain_id) ) goto gfn_found; } gdprintk(XENLOG_ERR, "Could not find gfn_info for shared gfn: " "%lx\n", gfn); BUG(); + gfn_found: /* Delete gfn_info from the list, but hold on to it, until we''ve allocated * memory to make a copy */ - list_del(&gfn_info->list); - last_gfn = list_empty(&hash_entry->gfns); + last_gfn = list_has_one_entry(&page->shared_info->gfns); /* If the GFN is getting destroyed drop the references to MFN * (possibly freeing the page), and exit early */ if ( flags & MEM_SHARING_DESTROY_GFN ) { mem_sharing_gfn_destroy(gfn_info, !last_gfn); - if(last_gfn) - mem_sharing_hash_delete(handle); - else + if ( !last_gfn ) /* Even though we don''t allocate a private page, we have to account * for the MFN that originally backed this PFN. */ atomic_dec(&nr_saved_mfns); + + if ( last_gfn ) + { + audit_del_list(page); + xfree(page->shared_info); + page->shared_info = NULL; + } + put_gfn(d, gfn); shr_unlock(); put_page_and_type(page); - if(last_gfn && - test_and_clear_bit(_PGC_allocated, &page->count_info)) + if( last_gfn && + test_and_clear_bit(_PGC_allocated, &page->count_info)) put_page(page); + return 0; } - ret = page_make_private(d, page); - BUG_ON(last_gfn & ret); - if(ret == 0) goto private_page_found; + if ( last_gfn ) + { + audit_del_list(page); + mem_sharing_gfn_destroy(gfn_info, !last_gfn); + xfree(page->shared_info); + page->shared_info = NULL; + BUG_ON(page_make_private(d, page) != 0); + goto private_page_found; + } old_page = page; page = mem_sharing_alloc_page(d, gfn); - if(!page) + if ( !page ) { - /* We''ve failed to obtain memory for private page. Need to re-add the - * gfn_info to relevant list */ - list_add(&gfn_info->list, &hash_entry->gfns); put_gfn(d, gfn); shr_unlock(); return -ENOMEM; @@ -687,30 +701,23 @@ gfn_found: unmap_domain_page(s); unmap_domain_page(t); - /* NOTE: set_shared_p2m_entry will switch the underlying mfn. If - * we do get_page withing get_gfn, the correct sequence here - * should be - get_page(page); - put_page(old_page); - * so that the ref to the old page is dropped, and a ref to - * the new page is obtained to later be dropped in put_gfn */ BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page)) == 0); + /* We''ve got a private page, we can commit the gfn destruction */ + mem_sharing_gfn_destroy(gfn_info, !last_gfn); put_page_and_type(old_page); private_page_found: - /* We''ve got a private page, we can commit the gfn destruction */ - mem_sharing_gfn_destroy(gfn_info, !last_gfn); - if(last_gfn) - mem_sharing_hash_delete(handle); - else + if ( !last_gfn ) atomic_dec(&nr_saved_mfns); if ( p2m_change_type(d, gfn, p2m_ram_shared, p2m_ram_rw) != p2m_ram_shared ) { - printk("Could not change p2m type.\n"); + gdprintk(XENLOG_ERR, "Could not change p2m type d %hu gfn %lx.\n", + d->domain_id, gfn); BUG(); } + /* Update m2p entry */ set_gpfn_from_mfn(mfn_x(page_to_mfn(page)), gfn); @@ -767,9 +774,18 @@ int mem_sharing_domctl(struct domain *d, case XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE: { - shr_handle_t sh = mec->u.share.source_handle; - shr_handle_t ch = mec->u.share.client_handle; - rc = mem_sharing_share_pages(sh, ch); + unsigned long sgfn = mec->u.share.source_gfn; + shr_handle_t sh = mec->u.share.source_handle; + struct domain *cd = get_domain_by_id(mec->u.share.client_domain); + if ( cd ) + { + unsigned long cgfn = mec->u.share.client_gfn; + shr_handle_t ch = mec->u.share.client_handle; + rc = mem_sharing_share_pages(d, sgfn, sh, cd, cgfn, ch); + put_domain(cd); + } + else + return -EEXIST; } break; @@ -817,6 +833,6 @@ int mem_sharing_domctl(struct domain *d, void __init mem_sharing_init(void) { printk("Initing memory sharing.\n"); - mem_sharing_hash_init(); + mm_lock_init(&shr_lock); } diff -r 61da3fc46f06 -r 3038770886aa xen/include/asm-x86/mem_sharing.h --- a/xen/include/asm-x86/mem_sharing.h +++ b/xen/include/asm-x86/mem_sharing.h @@ -22,13 +22,23 @@ #ifndef __MEM_SHARING_H__ #define __MEM_SHARING_H__ +#include <public/domctl.h> + +typedef uint64_t shr_handle_t; + +struct page_sharing_info +{ + struct page_info *pg; /* Back pointer to the page. */ + shr_handle_t handle; /* Globally unique version / handle. */ + struct list_head entry; /* List of all shared pages (entry). */ + struct list_head gfns; /* List of domains and gfns for this page (head). */ +}; + #ifdef __x86_64__ #define sharing_supported(_d) \ (is_hvm_domain(_d) && paging_mode_hap(_d)) -typedef uint64_t shr_handle_t; - unsigned int mem_sharing_get_nr_saved_mfns(void); int mem_sharing_nominate_page(struct domain *d, unsigned long gfn, @@ -41,6 +51,7 @@ int mem_sharing_unshare_page(struct doma int mem_sharing_sharing_resume(struct domain *d); int mem_sharing_domctl(struct domain *d, xen_domctl_mem_sharing_op_t *mec); + void mem_sharing_init(void); #else diff -r 61da3fc46f06 -r 3038770886aa xen/include/asm-x86/mm.h --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -31,6 +31,8 @@ struct page_list_entry __pdx_t next, prev; }; +struct page_sharing_info; + struct page_info { union { @@ -49,8 +51,13 @@ struct page_info /* For non-pinnable single-page shadows, a higher entry that points * at us. */ paddr_t up; - /* For shared/sharable pages the sharing handle */ - uint64_t shr_handle; + /* For shared/sharable pages, we use a doubly-linked list + * of all the {pfn,domain} pairs that map this page. We also include + * an opaque handle, which is effectively a version, so that clients + * of sharing share the version they expect to. + * This list is allocated and freed when a page is shared/unshared. + */ + struct page_sharing_info *shared_info; }; /* Reference count and various PGC_xxx flags and fields. */ diff -r 61da3fc46f06 -r 3038770886aa xen/include/public/domctl.h --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -789,7 +789,10 @@ struct xen_domctl_mem_sharing_op { uint64_aligned_t handle; /* OUT: the handle */ } nominate; struct mem_sharing_op_share { /* OP_SHARE */ + uint64_aligned_t source_gfn; /* IN: the gfn of the source page */ uint64_aligned_t source_handle; /* IN: handle to the source page */ + uint64_aligned_t client_domain; /* IN: the client domain id */ + uint64_aligned_t client_gfn; /* IN: the client gfn */ uint64_aligned_t client_handle; /* IN: handle to the client page */ } share; struct mem_sharing_op_debug { /* OP_DEBUG_xxx */
Andres Lagar-Cavilla
2011-Dec-08 07:47 UTC
[PATCH 04 of 18] x86/mm: Update mem sharing interface to (re)allow sharing of grants
xen/arch/x86/mm/mem_sharing.c | 65 +++++++++++++++++++++++++++++++++++++----- xen/include/public/domctl.h | 9 +++++ 2 files changed, 65 insertions(+), 9 deletions(-) Previosuly, the mem sharing code would return an opaque handle to index shared pages (and nominees) in its global hash table. By removing the hash table, the new interfaces requires a gfn and a version. However, when sharing grants, the caller only has a grant ref and a version. Update interface to handle this case. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 3038770886aa -r 2e8d5702f4c1 xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -774,18 +774,65 @@ int mem_sharing_domctl(struct domain *d, case XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE: { - unsigned long sgfn = mec->u.share.source_gfn; - shr_handle_t sh = mec->u.share.source_handle; - struct domain *cd = get_domain_by_id(mec->u.share.client_domain); - if ( cd ) + unsigned long sgfn, cgfn; + struct domain *cd; + shr_handle_t sh, ch; + int source_is_gref = 0; + + if ( !mem_sharing_enabled(d) ) + return -EINVAL; + + cd = get_domain_by_id(mec->u.share.client_domain); + if ( !cd ) + return -ESRCH; + + if ( !mem_sharing_enabled(cd) ) { - unsigned long cgfn = mec->u.share.client_gfn; - shr_handle_t ch = mec->u.share.client_handle; - rc = mem_sharing_share_pages(d, sgfn, sh, cd, cgfn, ch); put_domain(cd); + return -EINVAL; } - else - return -EEXIST; + + if ( XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.source_gfn) ) + { + grant_ref_t gref = (grant_ref_t) + (XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF( + mec->u.share.source_gfn)); + if ( mem_sharing_gref_to_gfn(d, gref, &sgfn) < 0 ) + { + put_domain(cd); + return -EINVAL; + } + source_is_gref = 1; + } else { + sgfn = mec->u.share.source_gfn; + } + + if ( XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.client_gfn) ) + { + grant_ref_t gref = (grant_ref_t) + (XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF( + mec->u.share.client_gfn)); + if ( (!source_is_gref) || + (mem_sharing_gref_to_gfn(cd, gref, &cgfn) < 0) ) + { + put_domain(cd); + return -EINVAL; + } + } else { + if ( source_is_gref ) + { + put_domain(cd); + return -EINVAL; + } + cgfn = mec->u.share.client_gfn; + } + + sh = mec->u.share.source_handle; + ch = mec->u.share.client_handle; + + rc = mem_sharing_share_pages(d, sgfn, sh, cd, cgfn, ch); + + put_domain(cd); } break; diff -r 3038770886aa -r 2e8d5702f4c1 xen/include/public/domctl.h --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -775,6 +775,15 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_mem_e #define XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID (-10) #define XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID (-9) +#define XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG (1ULL << 62) + +#define XEN_DOMCTL_MEM_SHARING_FIELD_MAKE_GREF(field, val) \ + (field) = (XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG | val) +#define XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(field) \ + ((field) & XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG) +#define XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF(field) \ + ((field) & (~XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF_FLAG)) + struct xen_domctl_mem_sharing_op { uint8_t op; /* XEN_DOMCTL_MEM_EVENT_OP_* */
Andres Lagar-Cavilla
2011-Dec-08 07:47 UTC
[PATCH 05 of 18] Tools: Do not assume sharing target is dom0 in libxc wrappers
tools/libxc/xc_memshr.c | 3 ++- tools/libxc/xenctrl.h | 1 + 2 files changed, 3 insertions(+), 1 deletions(-) The libxc wrapper that shares two pages always assumed one of the pages was mapped by dom0. Expand the API to specify both sharing domains. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 2e8d5702f4c1 -r b398fc97ab19 tools/libxc/xc_memshr.c --- a/tools/libxc/xc_memshr.c +++ b/tools/libxc/xc_memshr.c @@ -87,6 +87,7 @@ int xc_memshr_nominate_gref(xc_interface } int xc_memshr_share(xc_interface *xch, + uint32_t source_domain, uint64_t source_handle, uint64_t client_handle) { @@ -95,7 +96,7 @@ int xc_memshr_share(xc_interface *xch, domctl.cmd = XEN_DOMCTL_mem_sharing_op; domctl.interface_version = XEN_DOMCTL_INTERFACE_VERSION; - domctl.domain = 0; + domctl.domain = (domid_t) source_domain; op = &(domctl.u.mem_sharing_op); op->op = XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE; op->u.share.source_handle = source_handle; diff -r 2e8d5702f4c1 -r b398fc97ab19 tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -1892,6 +1892,7 @@ int xc_memshr_nominate_gref(xc_interface grant_ref_t gref, uint64_t *handle); int xc_memshr_share(xc_interface *xch, + uint32_t source_domain, uint64_t source_handle, uint64_t client_handle); int xc_memshr_domain_resume(xc_interface *xch,
Andres Lagar-Cavilla
2011-Dec-08 07:47 UTC
[PATCH 06 of 18] Tools: Update libxc mem sharing interface
tools/libxc/xc_memshr.c | 19 ++++++++++++++++++- tools/libxc/xenctrl.h | 7 ++++++- 2 files changed, 24 insertions(+), 2 deletions(-) Previosuly, the mem sharing code would return an opaque handle to index shared pages (and nominees) in its global hash table. By removing the hash table, the handle becomes a version, to avoid sharing a stale version of a page. Thus, libxc wrappers need to be updated accordingly. Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r b398fc97ab19 -r 6ad4a8da105e tools/libxc/xc_memshr.c --- a/tools/libxc/xc_memshr.c +++ b/tools/libxc/xc_memshr.c @@ -88,8 +88,13 @@ int xc_memshr_nominate_gref(xc_interface int xc_memshr_share(xc_interface *xch, uint32_t source_domain, + uint64_t source_gfn, uint64_t source_handle, - uint64_t client_handle) + int source_is_gref, + uint32_t client_domain, + uint64_t client_gfn, + uint64_t client_handle, + int client_is_gref) { DECLARE_DOMCTL; struct xen_domctl_mem_sharing_op *op; @@ -100,8 +105,20 @@ int xc_memshr_share(xc_interface *xch, op = &(domctl.u.mem_sharing_op); op->op = XEN_DOMCTL_MEM_EVENT_OP_SHARING_SHARE; op->u.share.source_handle = source_handle; + op->u.share.client_domain = (uint64_t) client_domain; op->u.share.client_handle = client_handle; + if (source_is_gref) + XEN_DOMCTL_MEM_SHARING_FIELD_MAKE_GREF( + op->u.share.source_gfn, source_gfn); + else + op->u.share.source_gfn = source_gfn; + if (client_is_gref) + XEN_DOMCTL_MEM_SHARING_FIELD_MAKE_GREF( + op->u.share.client_gfn, client_gfn); + else + op->u.share.client_gfn = client_gfn; + return do_domctl(xch, &domctl); } diff -r b398fc97ab19 -r 6ad4a8da105e tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -1893,8 +1893,13 @@ int xc_memshr_nominate_gref(xc_interface uint64_t *handle); int xc_memshr_share(xc_interface *xch, uint32_t source_domain, + uint64_t source_gfn, uint64_t source_handle, - uint64_t client_handle); + int source_is_gref, + uint32_t client_domain, + uint64_t client_gfn, + uint64_t client_handle, + int dest_is_gref); int xc_memshr_domain_resume(xc_interface *xch, uint32_t domid); int xc_memshr_debug_gfn(xc_interface *xch,
Andres Lagar-Cavilla
2011-Dec-08 07:47 UTC
[PATCH 07 of 18] Tools: Update memshr tool to use new sharing API
tools/blktap2/drivers/Makefile | 2 +- tools/blktap2/drivers/tapdisk-image.c | 2 +- tools/blktap2/drivers/tapdisk-vbd.c | 6 +++--- tools/blktap2/drivers/tapdisk.h | 6 +++++- tools/memshr/bidir-daemon.c | 4 ++++ tools/memshr/bidir-hash.h | 13 ++++++++----- tools/memshr/interface.c | 31 +++++++++++++++++++------------ tools/memshr/memshr.h | 11 +++++++++-- 8 files changed, 50 insertions(+), 25 deletions(-) The only (in-tree, that we know of) consumer of the mem sharing API is the memshr tool (conditionally linked into blktap2). Update it to use the new API. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/blktap2/drivers/Makefile --- a/tools/blktap2/drivers/Makefile +++ b/tools/blktap2/drivers/Makefile @@ -43,7 +43,7 @@ MEMSHR_DIR = $(XEN_ROOT)/tools/memshr MEMSHRLIBS : ifeq ($(CONFIG_Linux), __fixme__) CFLAGS += -DMEMSHR -MEMSHRLIBS += $(MEMSHR_DIR)/libmemshr.a +MEMSHRLIBS += -L$(XEN_ROOT)/tools/libxc -lxenctrl $(MEMSHR_DIR)/libmemshr.a endif ifeq ($(VHD_STATIC),y) diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/blktap2/drivers/tapdisk-image.c --- a/tools/blktap2/drivers/tapdisk-image.c +++ b/tools/blktap2/drivers/tapdisk-image.c @@ -60,7 +60,7 @@ tapdisk_image_allocate(const char *file, image->storage = storage; image->private = private; #ifdef MEMSHR - image->memshr_id = memshr_vbd_image_get(file); + image->memshr_id = memshr_vbd_image_get((char *)file); #endif INIT_LIST_HEAD(&image->next); diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/blktap2/drivers/tapdisk-vbd.c --- a/tools/blktap2/drivers/tapdisk-vbd.c +++ b/tools/blktap2/drivers/tapdisk-vbd.c @@ -1218,14 +1218,14 @@ __tapdisk_vbd_complete_td_request(td_vbd #ifdef MEMSHR if (treq.op == TD_OP_READ && td_flag_test(image->flags, TD_OPEN_RDONLY)) { - uint64_t hnd = treq.memshr_hnd; + share_tuple_t hnd = treq.memshr_hnd; uint16_t uid = image->memshr_id; blkif_request_t *breq = &vreq->req; uint64_t sec = tapdisk_vbd_breq_get_sector(breq, treq); int secs = breq->seg[treq.sidx].last_sect - breq->seg[treq.sidx].first_sect + 1; - if (hnd != 0) + if (hnd.handle != 0) memshr_vbd_complete_ro_request(hnd, uid, sec, secs); } @@ -1297,7 +1297,7 @@ __tapdisk_vbd_reissue_td_request(td_vbd_ /* Reset memshr handle. This''ll prevent * memshr_vbd_complete_ro_request being called */ - treq.memshr_hnd = 0; + treq.memshr_hnd.handle = 0; td_complete_request(treq, 0); } else td_queue_read(parent, treq); diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/blktap2/drivers/tapdisk.h --- a/tools/blktap2/drivers/tapdisk.h +++ b/tools/blktap2/drivers/tapdisk.h @@ -64,6 +64,10 @@ #include "tapdisk-log.h" #include "tapdisk-utils.h" +#ifdef MEMSHR +#include "memshr.h" +#endif + #define DPRINTF(_f, _a...) syslog(LOG_INFO, _f, ##_a) #define EPRINTF(_f, _a...) syslog(LOG_ERR, "tap-err:%s: " _f, __func__, ##_a) #define PERROR(_f, _a...) EPRINTF(_f ": %s", ##_a, strerror(errno)) @@ -136,7 +140,7 @@ struct td_request { void *private; #ifdef MEMSHR - uint64_t memshr_hnd; + share_tuple_t memshr_hnd; #endif }; diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/memshr/bidir-daemon.c --- a/tools/memshr/bidir-daemon.c +++ b/tools/memshr/bidir-daemon.c @@ -48,7 +48,11 @@ void* bidir_daemon(void *unused) to_remove = 0.1 * max_nr_ent; while(to_remove > 0) { +#if 0 ret = blockshr_shrhnd_remove(blks_hash, next_remove, NULL); +#else + ret = -1; +#endif if(ret < 0) { /* We failed to remove an entry, because of a serious hash diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/memshr/bidir-hash.h --- a/tools/memshr/bidir-hash.h +++ b/tools/memshr/bidir-hash.h @@ -81,15 +81,16 @@ static int fgprtshr_mfn_cmp(uint32_t m1, #undef BIDIR_VALUE #undef BIDIR_KEY_T #undef BIDIR_VALUE_T + /* TODO better hashes! */ static inline uint32_t blockshr_block_hash(vbdblk_t block) { return (uint32_t)(block.sec) ^ (uint32_t)(block.disk_id); } -static inline uint32_t blockshr_shrhnd_hash(uint64_t shrhnd) +static inline uint32_t blockshr_shrhnd_hash(share_tuple_t shrhnd) { - return (uint32_t)shrhnd; + return ((uint32_t) shrhnd.handle); } static inline int blockshr_block_cmp(vbdblk_t b1, vbdblk_t b2) @@ -97,15 +98,17 @@ static inline int blockshr_block_cmp(vbd return (b1.sec == b2.sec) && (b1.disk_id == b2.disk_id); } -static inline int blockshr_shrhnd_cmp(uint64_t h1, uint64_t h2) +static inline int blockshr_shrhnd_cmp(share_tuple_t h1, share_tuple_t h2) { - return (h1 == h2); + return ( (h1.domain == h2.domain) && + (h1.frame == h2.frame) && + (h1.handle == h2.handle) ); } #define BIDIR_NAME_PREFIX blockshr #define BIDIR_KEY block #define BIDIR_VALUE shrhnd #define BIDIR_KEY_T vbdblk_t -#define BIDIR_VALUE_T uint64_t +#define BIDIR_VALUE_T share_tuple_t #include "bidir-namedefs.h" #endif /* BLOCK_MAP */ diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/memshr/interface.c --- a/tools/memshr/interface.c +++ b/tools/memshr/interface.c @@ -145,16 +145,17 @@ void memshr_vbd_image_put(uint16_t memsh int memshr_vbd_issue_ro_request(char *buf, grant_ref_t gref, - uint16_t file_id, + uint16_t file_id, uint64_t sec, int secs, - uint64_t *hnd) + share_tuple_t *hnd) { vbdblk_t blk; - uint64_t s_hnd, c_hnd; + share_tuple_t source_st, client_st; + uint64_t c_hnd; int ret; - *hnd = 0; + *hnd = (share_tuple_t){ 0, 0, 0 }; if(!vbd_info.enabled) return -1; @@ -169,26 +170,31 @@ int memshr_vbd_issue_ro_request(char *bu /* If page couldn''t be made sharable, we cannot do anything about it */ if(ret != 0) return -3; - *hnd = c_hnd; + + *(&client_st) = (share_tuple_t){ vbd_info.domid, gref, c_hnd }; + *hnd = client_st; /* Check if we''ve read matching disk block previously */ blk.sec = sec; blk.disk_id = file_id; - if(blockshr_block_lookup(memshr.blks, blk, &s_hnd) > 0) + if(blockshr_block_lookup(memshr.blks, blk, &source_st) > 0) { - ret = xc_memshr_share(vbd_info.xc_handle, s_hnd, c_hnd); + ret = xc_memshr_share(vbd_info.xc_handle, source_st.domain, source_st.frame, 1, + source_st.handle, vbd_info.domid, gref, c_hnd, 1); if(!ret) return 0; /* Handles failed to be shared => at least one of them must be invalid, remove the relevant ones from the map */ switch(ret) { case XEN_DOMCTL_MEM_SHARING_S_HANDLE_INVALID: - ret = blockshr_shrhnd_remove(memshr.blks, s_hnd, NULL); - if(ret) DPRINTF("Could not rm invl s_hnd: %"PRId64"\n", s_hnd); + ret = blockshr_shrhnd_remove(memshr.blks, source_st, NULL); + if(ret) DPRINTF("Could not rm invl s_hnd: %u %"PRId64" %"PRId64"\n", + source_st.domain, source_st.frame, source_st.handle); break; case XEN_DOMCTL_MEM_SHARING_C_HANDLE_INVALID: - ret = blockshr_shrhnd_remove(memshr.blks, c_hnd, NULL); - if(ret) DPRINTF("Could not rm invl c_hnd: %"PRId64"\n", c_hnd); + ret = blockshr_shrhnd_remove(memshr.blks, client_st, NULL); + if(ret) DPRINTF("Could not rm invl c_hnd: %u %"PRId64" %"PRId64"\n", + client_st.domain, client_st.frame, client_st.handle); break; default: break; @@ -199,12 +205,13 @@ int memshr_vbd_issue_ro_request(char *bu return -4; } -void memshr_vbd_complete_ro_request(uint64_t hnd, +void memshr_vbd_complete_ro_request(share_tuple_t hnd, uint16_t file_id, uint64_t sec, int secs) { vbdblk_t blk; + share_tuple_t shr_tuple; if(!vbd_info.enabled) return; diff -r 6ad4a8da105e -r 8d2a8094ace5 tools/memshr/memshr.h --- a/tools/memshr/memshr.h +++ b/tools/memshr/memshr.h @@ -25,6 +25,13 @@ typedef uint64_t xen_mfn_t; +typedef struct share_tuple +{ + uint32_t domain; + uint64_t frame; + uint64_t handle; +} share_tuple_t; + extern void memshr_set_domid(int domid); extern void memshr_daemon_initialize(void); extern void memshr_vbd_initialize(void); @@ -35,9 +42,9 @@ extern int memshr_vbd_issue_ro_request(c uint16_t file_id, uint64_t sec, int secs, - uint64_t *hnd); + share_tuple_t *hnd); extern void memshr_vbd_complete_ro_request( - uint64_t hnd, + share_tuple_t hnd, uint16_t file_id, uint64_t sec, int secs);
Andres Lagar-Cavilla
2011-Dec-08 07:47 UTC
[PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages
tools/libxl/xl.h | 1 + tools/libxl/xl_cmdimpl.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++ tools/libxl/xl_cmdtable.c | 6 +++ 3 files changed, 92 insertions(+), 0 deletions(-) Signed-off-by: Adin Scannell <adin@scannell.ca> diff -r 8d2a8094ace5 -r 24d514cd4dee tools/libxl/xl.h --- a/tools/libxl/xl.h +++ b/tools/libxl/xl.h @@ -28,6 +28,7 @@ struct cmd_spec { int main_vcpulist(int argc, char **argv); int main_info(int argc, char **argv); +int main_sharing(int argc, char **argv); int main_cd_eject(int argc, char **argv); int main_cd_insert(int argc, char **argv); int main_console(int argc, char **argv); diff -r 8d2a8094ace5 -r 24d514cd4dee tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -3755,6 +3755,91 @@ int main_info(int argc, char **argv) return 0; } +static void sharing(int totals, const libxl_dominfo *info, int nb_domain) +{ + int i; + + printf("Name ID Mem Shared\n"); + + for (i = 0; i < nb_domain; i++) { + char *domname; + unsigned shutdown_reason; + domname = libxl_domid_to_name(ctx, info[i].domid); + shutdown_reason = info[i].shutdown ? info[i].shutdown_reason : 0; + printf("%-40s %5d %5lu %5lu\n", + domname, + info[i].domid, + (unsigned long) (info[i].current_memkb / 1024), + (unsigned long) (info[i].shared_memkb / 1024)); + free(domname); + } + + if (totals) + { + /* To be added with a future patch. */ + } +} + +int main_sharing(int argc, char **argv) +{ + int opt; + int option_index = 0; + static struct option long_options[] = { + {"help", 0, 0, ''h''}, + {"totals", 0, 0, ''t''}, + {0, 0, 0, 0} + }; + int totals = 0; + + libxl_dominfo info_buf; + libxl_dominfo *info, *info_free=0; + int nb_domain, rc; + + while ((opt = getopt_long(argc, argv, "ht", long_options, &option_index)) != -1) { + switch (opt) { + case ''h'': + help("sharing"); + return 0; + case ''t'': + totals = 1; + break; + default: + fprintf(stderr, "option `%c'' not supported.\n", optopt); + break; + } + } + + if (optind >= argc) { + info = libxl_list_domain(ctx, &nb_domain); + if (!info) { + fprintf(stderr, "libxl_domain_infolist failed.\n"); + return 1; + } + info_free = info; + } else if (optind == argc-1) { + find_domain(argv[optind]); + rc = libxl_domain_info(ctx, &info_buf, domid); + if (rc == ERROR_INVAL) { + fprintf(stderr, "Error: Domain \''%s\'' does not exist.\n", + argv[optind]); + return -rc; + } + if (rc) { + fprintf(stderr, "libxl_domain_info failed (code %d).\n", rc); + return -rc; + } + info = &info_buf; + nb_domain = 1; + } else { + help("sharing"); + return 2; + } + + sharing(totals, info, nb_domain); + + return 0; +} + static int sched_credit_domain_get( int domid, libxl_sched_credit *scinfo) { diff -r 8d2a8094ace5 -r 24d514cd4dee tools/libxl/xl_cmdtable.c --- a/tools/libxl/xl_cmdtable.c +++ b/tools/libxl/xl_cmdtable.c @@ -189,6 +189,12 @@ struct cmd_spec cmd_table[] = { "Get information about Xen host", "-n, --numa List host NUMA topology information", }, + { "sharing", + &main_sharing, 0, + "Get information about page sharing", + "[options] [Domain]", + "-t, --totals Include host totals in the output", + }, { "sched-credit", &main_sched_credit, 0, "Get/set credit scheduler parameters",
Andres Lagar-Cavilla
2011-Dec-08 07:47 UTC
[PATCH 09 of 18] x86/mm: Check how many mfns are shared, in addition to how many are saved
xen/arch/x86/mm.c | 6 ------ xen/arch/x86/mm/mem_sharing.c | 31 ++++++++++++++++++++++++++++--- xen/arch/x86/x86_64/compat/mm.c | 6 ++++++ xen/arch/x86/x86_64/mm.c | 7 +++++++ xen/include/asm-x86/mem_sharing.h | 1 + xen/include/public/memory.h | 1 + 6 files changed, 43 insertions(+), 9 deletions(-) This patch also moves the existing sharing-related memory op to the correct location, and adds logic to the audit() method that uses the new information. This patch only provides the Xen implementation of the domctls. Signed-off-by: Adin Scannell <adin@scannell.ca> diff -r 24d514cd4dee -r 65b32b391373 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -119,7 +119,6 @@ #include <xen/trace.h> #include <asm/setup.h> #include <asm/fixmap.h> -#include <asm/mem_sharing.h> /* * Mapping of first 2 or 4 megabytes of memory. This is mapped with 4kB @@ -5093,11 +5092,6 @@ long arch_memory_op(int op, XEN_GUEST_HA return rc; } -#ifdef __x86_64__ - case XENMEM_get_sharing_freed_pages: - return mem_sharing_get_nr_saved_mfns(); -#endif - default: return subarch_memory_op(op, arg); } diff -r 24d514cd4dee -r 65b32b391373 xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -75,6 +75,7 @@ static inline int mem_sharing_audit(void static shr_handle_t next_handle = 1; static atomic_t nr_saved_mfns = ATOMIC_INIT(0); +static atomic_t nr_shared_mfns = ATOMIC_INIT(0); typedef struct gfn_info { @@ -153,9 +154,12 @@ static struct page_info* mem_sharing_loo static int mem_sharing_audit(void) { int errors = 0; + unsigned long count_expected; + unsigned long count_found = 0; struct list_head *ae; ASSERT(shr_locked_by_me()); + count_expected = atomic_read(&nr_shared_mfns); list_for_each(ae, &shr_audit_list) { @@ -194,6 +198,7 @@ static int mem_sharing_audit(void) errors++; } + count_found++; /* Check if all GFNs map to the MFN, and the p2m types */ list_for_each(le, &pg->shared_info->gfns) { @@ -239,6 +244,13 @@ static int mem_sharing_audit(void) } } + if ( count_found != count_expected ) + { + MEM_SHARING_DEBUG("Expected %ld shared mfns, found %ld.", + count_expected, count_found); + errors++; + } + return errors; } #endif @@ -296,6 +308,11 @@ unsigned int mem_sharing_get_nr_saved_mf return ((unsigned int)atomic_read(&nr_saved_mfns)); } +unsigned int mem_sharing_get_nr_shared_mfns(void) +{ + return (unsigned int)atomic_read(&nr_shared_mfns); +} + int mem_sharing_sharing_resume(struct domain *d) { mem_event_response_t rsp; @@ -570,10 +587,11 @@ int mem_sharing_share_pages(struct domai BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn) == 0); if ( single_client_gfn ) { - /* Only increase the per-domain count when we are actually + /* Only increase the stats counts when we are actually * sharing. And don''t increase it should we ever re-share */ atomic_inc(&d->shr_pages); ASSERT( cd == d ); + atomic_inc(&nr_shared_mfns); } put_domain(d); } @@ -654,10 +672,14 @@ gfn_found: if ( flags & MEM_SHARING_DESTROY_GFN ) { mem_sharing_gfn_destroy(gfn_info, !last_gfn); - if ( !last_gfn ) + if ( last_gfn ) + { + atomic_dec(&nr_shared_mfns); + } else { /* Even though we don''t allocate a private page, we have to account * for the MFN that originally backed this PFN. */ atomic_dec(&nr_saved_mfns); + } if ( last_gfn ) { @@ -707,8 +729,11 @@ gfn_found: put_page_and_type(old_page); private_page_found: - if ( !last_gfn ) + if ( last_gfn ) { + atomic_dec(&nr_shared_mfns); + } else { atomic_dec(&nr_saved_mfns); + } if ( p2m_change_type(d, gfn, p2m_ram_shared, p2m_ram_rw) != p2m_ram_shared ) diff -r 24d514cd4dee -r 65b32b391373 xen/arch/x86/x86_64/compat/mm.c --- a/xen/arch/x86/x86_64/compat/mm.c +++ b/xen/arch/x86/x86_64/compat/mm.c @@ -205,6 +205,12 @@ int compat_arch_memory_op(int op, XEN_GU break; } + case XENMEM_get_sharing_freed_pages: + return mem_sharing_get_nr_saved_mfns(); + + case XENMEM_get_sharing_shared_pages: + return mem_sharing_get_nr_shared_mfns(); + default: rc = -ENOSYS; break; diff -r 24d514cd4dee -r 65b32b391373 xen/arch/x86/x86_64/mm.c --- a/xen/arch/x86/x86_64/mm.c +++ b/xen/arch/x86/x86_64/mm.c @@ -34,6 +34,7 @@ #include <asm/msr.h> #include <asm/setup.h> #include <asm/numa.h> +#include <asm/mem_sharing.h> #include <public/memory.h> /* Parameters for PFN/MADDR compression. */ @@ -1090,6 +1091,12 @@ long subarch_memory_op(int op, XEN_GUEST break; + case XENMEM_get_sharing_freed_pages: + return mem_sharing_get_nr_saved_mfns(); + + case XENMEM_get_sharing_shared_pages: + return mem_sharing_get_nr_shared_mfns(); + default: rc = -ENOSYS; break; diff -r 24d514cd4dee -r 65b32b391373 xen/include/asm-x86/mem_sharing.h --- a/xen/include/asm-x86/mem_sharing.h +++ b/xen/include/asm-x86/mem_sharing.h @@ -40,6 +40,7 @@ struct page_sharing_info (is_hvm_domain(_d) && paging_mode_hap(_d)) unsigned int mem_sharing_get_nr_saved_mfns(void); +unsigned int mem_sharing_get_nr_shared_mfns(void); int mem_sharing_nominate_page(struct domain *d, unsigned long gfn, int expected_refcnt, diff -r 24d514cd4dee -r 65b32b391373 xen/include/public/memory.h --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -294,6 +294,7 @@ typedef struct xen_pod_target xen_pod_ta * The call never fails. */ #define XENMEM_get_sharing_freed_pages 18 +#define XENMEM_get_sharing_shared_pages 19 #endif /* __XEN_PUBLIC_MEMORY_H__ */
Andres Lagar-Cavilla
2011-Dec-08 07:47 UTC
[PATCH 10 of 18] Tools: Expose to libxc the total number of shared frames and space saved
tools/libxc/xc_private.c | 10 ++++++++++ tools/libxc/xenctrl.h | 4 ++++ 2 files changed, 14 insertions(+), 0 deletions(-) Signed-off-by: Adin Scannell <adin@scannell.ca> diff -r 65b32b391373 -r 6f489a294a76 tools/libxc/xc_private.c --- a/tools/libxc/xc_private.c +++ b/tools/libxc/xc_private.c @@ -533,6 +533,16 @@ long xc_maximum_ram_page(xc_interface *x return do_memory_op(xch, XENMEM_maximum_ram_page, NULL, 0); } +long xc_sharing_freed_pages(xc_interface *xch) +{ + return do_memory_op(xch, XENMEM_get_sharing_freed_pages, NULL, 0); +} + +long xc_sharing_used_frames(xc_interface *xch) +{ + return do_memory_op(xch, XENMEM_get_sharing_shared_pages, NULL, 0); +} + long long xc_domain_get_cpu_usage( xc_interface *xch, domid_t domid, int vcpu ) { DECLARE_DOMCTL; diff -r 65b32b391373 -r 6f489a294a76 tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -1227,6 +1227,10 @@ int xc_mmuext_op(xc_interface *xch, stru /* System wide memory properties */ long xc_maximum_ram_page(xc_interface *xch); +long xc_sharing_freed_pages(xc_interface *xch); + +long xc_sharing_used_frames(xc_interface *xch); + /* Get current total pages allocated to a domain. */ long xc_get_tot_pages(xc_interface *xch, uint32_t domid);
Andres Lagar-Cavilla
2011-Dec-08 07:47 UTC
[PATCH 11 of 18] Tools: Allow libxl/xl to expose the total number of shared frames and space saved
tools/libxl/Makefile | 2 +- tools/libxl/xl_cmdimpl.c | 13 ++++++++++--- 2 files changed, 11 insertions(+), 4 deletions(-) Signed-off-by: Adin Scannell <adin@scannell.ca> diff -r 6f489a294a76 -r cde3529132c1 tools/libxl/Makefile --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -89,7 +89,7 @@ libxl_internal.h: _libxl_types_internal. libxl_internal_json.h: _libxl_types_internal_json.h $(LIBXL_OBJS) $(LIBXLU_OBJS) $(XL_OBJS): libxl.h -$(LIBXL_OBJS): libxl_internal.h +$(LIBXL_OBJS) $(XL_OBJS): libxl_internal.h _libxl_type%.h _libxl_type%_json.h _libxl_type%.c: libxl_type%.idl gentypes.py libxltypes.py $(PYTHON) gentypes.py libxl_type$*.idl __libxl_type$*.h __libxl_type$*_json.h __libxl_type$*.c diff -r 6f489a294a76 -r cde3529132c1 tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -38,6 +38,7 @@ #include "libxl.h" #include "libxl_utils.h" +#include "libxl_internal.h" #include "libxlutil.h" #include "xl.h" @@ -3758,6 +3759,7 @@ int main_info(int argc, char **argv) static void sharing(int totals, const libxl_dominfo *info, int nb_domain) { int i; + const libxl_version_info *vinfo; printf("Name ID Mem Shared\n"); @@ -3774,9 +3776,14 @@ static void sharing(int totals, const li free(domname); } - if (totals) - { - /* To be added with a future patch. */ + if (totals) { + vinfo = libxl_get_version_info(ctx); + if (vinfo) { + i = (1 << 20) / vinfo->pagesize; + printf("Freed with sharing: %ld Total physical shared: %ld\n", + xc_sharing_freed_pages(ctx->xch) / i, + xc_sharing_used_frames(ctx->xch) / i); + } } }
Andres Lagar-Cavilla
2011-Dec-08 07:47 UTC
[PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
xen/arch/x86/mm.c | 4 ++-- xen/include/asm-x86/mm.h | 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) This is necessary for a new consumer of page_lock/unlock to follow in the series. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r cde3529132c1 -r ecf95feef9ac xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1718,7 +1718,7 @@ static int free_l4_table(struct page_inf #define free_l4_table(page, preemptible) (-EINVAL) #endif -static int page_lock(struct page_info *page) +int page_lock(struct page_info *page) { unsigned long x, nx; @@ -1735,7 +1735,7 @@ static int page_lock(struct page_info *p return 1; } -static void page_unlock(struct page_info *page) +void page_unlock(struct page_info *page) { unsigned long x, nx, y = page->u.inuse.type_info; diff -r cde3529132c1 -r ecf95feef9ac xen/include/asm-x86/mm.h --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -337,6 +337,9 @@ int is_iomem_page(unsigned long mfn); void clear_superpage_mark(struct page_info *page); +int page_lock(struct page_info *page); +void page_unlock(struct page_info *page); + struct domain *page_get_owner_and_reference(struct page_info *page); void put_page(struct page_info *page); int get_page(struct page_info *page, struct domain *domain);
Tim Deegan
2011-Dec-08 11:11 UTC
Re: [PATCH 01 of 18] x86/mm: Code style fixes in mem_sharing.c
At 02:47 -0500 on 08 Dec (1323312436), Andres Lagar-Cavilla wrote:> Harmless patch, with no functional changes, only code style issues. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>...> - return -2; > + return 0;Ahem. :) Tim.
Tim Deegan
2011-Dec-08 11:16 UTC
Re: [PATCH 02 of 18] x86/mm: Making a page sharable sets PGT_validated, but making a page private doesn''t expect it
At 02:47 -0500 on 08 Dec (1323312437), Andres Lagar-Cavilla wrote:> diff -r aeebbff899ff -r 61da3fc46f06 xen/arch/x86/mm.c > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -4347,7 +4347,7 @@ int page_make_private(struct domain *d, > > /* We can only change the type if count is one */ > if ( (page->u.inuse.type_info & (PGT_type_mask | PGT_count_mask)) > - != (PGT_shared_page | 1) ) > + != (PGT_shared_page | PGT_validated | 1) )This seems wrong - surely PGT_validated isn''t in (PGT_type_mask | PGT_count_mask). Tim.
Andres Lagar-Cavilla
2011-Dec-08 16:16 UTC
Re: [PATCH 01 of 18] x86/mm: Code style fixes in mem_sharing.c
> At 02:47 -0500 on 08 Dec (1323312436), Andres Lagar-Cavilla wrote: >> Harmless patch, with no functional changes, only code style issues. >> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > ... > >> - return -2; >> + return 0; > > Ahem. :) > > Tim.Borderline, I''ll admit. I can play down the tone of the header -- nothing is really harmless :) The function will still produce the same outputs on the same inputs, with cleaner code. Thanks Andres>
Tim Deegan
2011-Dec-08 21:54 UTC
Re: [PATCH 01 of 18] x86/mm: Code style fixes in mem_sharing.c
At 08:16 -0800 on 08 Dec (1323332187), Andres Lagar-Cavilla wrote:> > At 02:47 -0500 on 08 Dec (1323312436), Andres Lagar-Cavilla wrote: > >> Harmless patch, with no functional changes, only code style issues. > >> > >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > > > ... > > > >> - return -2; > >> + return 0; > > > > Ahem. :) > > > > Tim. > Borderline, I''ll admit. I can play down the tone of the header -- nothing > is really harmless :) > > The function will still produce the same outputs on the same inputs, with > cleaner code.Ah, so it will. I''ll apply this, but in future I''d like long patches that do tedious whitespace fixups to _just_ have whitespace fixups. It reduces the chance of nasty surprises (and the chance that I''ll have to read a long patch of tedious whitespace fixups more than once!) Tim.
Tim Deegan
2011-Dec-08 22:13 UTC
Re: [PATCH 03 of 18] x86/mm: Eliminate hash table in sharing code as index of shared mfns
At 02:47 -0500 on 08 Dec (1323312438), Andres Lagar-Cavilla wrote:> xen/arch/x86/mm/mem_sharing.c | 526 +++++++++++++++++++------------------ > xen/include/asm-x86/mem_sharing.h | 15 +- > xen/include/asm-x86/mm.h | 11 +- > xen/include/public/domctl.h | 3 + > 4 files changed, 296 insertions(+), 259 deletions(-) > > > The hashtable mechanism used by the sharing code is a bit odd. It seems > unnecessary and adds complexity. Instead, we replace this by storing a list > head directly in the page info for the case when the page is shared. This does > not add any extra space to the page_info and serves to remove significant > complexity from sharing. > > Signed-off-by: Adin Scannell <adin@scannell.ca> > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>I like the look of this but I won''t apply until Olaf is happy with the interface changes. One nit:> diff -r 61da3fc46f06 -r 3038770886aa xen/include/asm-x86/mem_sharing.h > --- a/xen/include/asm-x86/mem_sharing.h > +++ b/xen/include/asm-x86/mem_sharing.h > @@ -22,13 +22,23 @@ > #ifndef __MEM_SHARING_H__ > #define __MEM_SHARING_H__ > > +#include <public/domctl.h> > + > +typedef uint64_t shr_handle_t; > + > +struct page_sharing_info > +{ > + struct page_info *pg; /* Back pointer to the page. */ > + shr_handle_t handle; /* Globally unique version / handle. */ > + struct list_head entry; /* List of all shared pages (entry). */IIUC this list is only used if the sharing audit code is anabled, so it should probably be appropritaely #ifdeffed out to save space in the non-audit case. Cheers, Tim.> + struct list_head gfns; /* List of domains and gfns for this page (head). */ > +}; > + > #ifdef __x86_64__ >
Tim Deegan
2011-Dec-08 22:20 UTC
Re: [PATCH 04 of 18] x86/mm: Update mem sharing interface to (re)allow sharing of grants
Hi, At 02:47 -0500 on 08 Dec (1323312439), Andres Lagar-Cavilla wrote:> + if ( XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.client_gfn) ) > + { > + grant_ref_t gref = (grant_ref_t) > + (XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF( > + mec->u.share.client_gfn)); > + if ( (!source_is_gref) || > + (mem_sharing_gref_to_gfn(cd, gref, &cgfn) < 0) ) > + { > + put_domain(cd); > + return -EINVAL; > + } > + } else { > + if ( source_is_gref ) > + { > + put_domain(cd); > + return -EINVAL; > + }Why can''t one input be a grant and the other not? If there''s a problem with that it should probably be documented in the hypercall interface. Cheers, Tim.
Tim Deegan
2011-Dec-08 22:27 UTC
Re: [PATCH 09 of 18] x86/mm: Check how many mfns are shared, in addition to how many are saved
At 02:47 -0500 on 08 Dec (1323312444), Andres Lagar-Cavilla wrote:> This patch also moves the existing sharing-related memory op to the > correct location, and adds logic to the audit() method that uses the > new information. > > This patch only provides the Xen implementation of the domctls. > > Signed-off-by: Adin Scannell <adin@scannell.ca>Acked-by: Tim Deegan <tim@xen.org> (once patches #3 and #4 are in). Tim.
Tim Deegan
2011-Dec-08 22:38 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote:> This is necessary for a new consumer of page_lock/unlock to follow in > the series. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>Nak, I''m afraid. These were OK as local functions but if they''re going to be made generally visible, they need clear comments describing what this locking protects and what the discipline is for avoiding deadlocks. Perhaps Jan or Keir can supply appropriate words. The locking was introduce in this cset: changeset: 17846:09dd5999401b user: Keir Fraser <keir.fraser@citrix.com> date: Thu Jun 12 18:14:00 2008 +0100 files: xen/arch/x86/domain.c xen/arch/x86/domain_build.c xen/arch/x86/mm.c description: x86: remove use of per-domain lock from page table entry handling This change results in a 5% performance improvement for kernel builds on dual-socket quad-core systems (which is what I used for reference for both 32- and 64-bit). Along with that, the amount of time reported as spent in the kernel gets reduced by almost 25% (the fraction of time spent in the kernel is generally reported significantly higher under Xen than with a native kernel). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com> Tim.
Keir Fraser
2011-Dec-08 23:06 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
On 08/12/2011 22:38, "Tim Deegan" <tim@xen.org> wrote:> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote: >> This is necessary for a new consumer of page_lock/unlock to follow in >> the series. >> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > Nak, I''m afraid. > > These were OK as local functions but if they''re going to be made > generally visible, they need clear comments describing what this > locking protects and what the discipline is for avoiding deadlocks. > > Perhaps Jan or Keir can supply appropriate words. The locking was > introduce in this cset:It''s Jan''s work originally, but the basic intention of page_lock is to serialise pte updates. To aid with this, a page''s type cannot change while its lock is held. No lock nests inside a page lock (not even other page locks) so there is no deadlock risk.> changeset: 17846:09dd5999401b > user: Keir Fraser <keir.fraser@citrix.com> > date: Thu Jun 12 18:14:00 2008 +0100 > files: xen/arch/x86/domain.c xen/arch/x86/domain_build.c > xen/arch/x86/mm.c > description: > x86: remove use of per-domain lock from page table entry handling > > This change results in a 5% performance improvement for kernel builds > on dual-socket quad-core systems (which is what I used for reference > for both 32- and 64-bit). Along with that, the amount of time reported > as spent in the kernel gets reduced by almost 25% (the fraction of > time spent in the kernel is generally reported significantly higher > under Xen than with a native kernel). > > Signed-off-by: Jan Beulich <jbeulich@novell.com> > Signed-off-by: Keir Fraser <keir.fraser@citrix.com> > > Tim.
Andres Lagar-Cavilla
2011-Dec-09 02:54 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote: >> This is necessary for a new consumer of page_lock/unlock to follow in >> the series. >> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > Nak, I''m afraid. > > These were OK as local functions but if they''re going to be made > generally visible, they need clear comments describing what this > locking protects and what the discipline is for avoiding deadlocks.How about something along the lines of "page_lock() is used for two purposes: pte serialization, and memory sharing. All users of page lock for pte serialization live in mm.c, use it to lock a page table page during pte updates, do not take other locks within the critical section delimited by page_lock/unlock, and perform no nesting. All users of page lock for memory sharing live in mm/mem_sharing.c. For memory sharing, nesting may happen when sharing (and locking) two pages -- deadlock is avoided by locking pages in increasing order. Memory sharing may take the p2m_lock within a page_lock/unlock critical section. These two users (pte serialization and memory sharing) should never collide, as long as page table pages are properly unshared prior to updating." Now those are all pretty words, but here are the two things I (think I) need to do: - Prior to updating pte''s, we do get_gfn on the page table page. We should be using get_gfn_unshare. Regardless of this patch. It''s likely never going to trigger an actual unshare, yet better safe than sorry. - I can wrap uses of page_lock in mm sharing in an "external" order-enforcing construct from mm-locks.h. And use that to scream deadlock between page_lock and p2m_lock. The code that actually uses page_lock()s in the memory sharing code can be found in "[PATCH] x86/mm: Eliminate global lock in mem sharing code". It already orders locking of individual pages in ascending order. Andres> > Perhaps Jan or Keir can supply appropriate words. The locking was > introduce in this cset: > > changeset: 17846:09dd5999401b > user: Keir Fraser <keir.fraser@citrix.com> > date: Thu Jun 12 18:14:00 2008 +0100 > files: xen/arch/x86/domain.c xen/arch/x86/domain_build.c > xen/arch/x86/mm.c > description: > x86: remove use of per-domain lock from page table entry handling > > This change results in a 5% performance improvement for kernel builds > on dual-socket quad-core systems (which is what I used for reference > for both 32- and 64-bit). Along with that, the amount of time reported > as spent in the kernel gets reduced by almost 25% (the fraction of > time spent in the kernel is generally reported significantly higher > under Xen than with a native kernel). > > Signed-off-by: Jan Beulich <jbeulich@novell.com> > Signed-off-by: Keir Fraser <keir.fraser@citrix.com> > > Tim. >
Andres Lagar-Cavilla
2011-Dec-09 02:57 UTC
Re: [PATCH 04 of 18] x86/mm: Update mem sharing interface to (re)allow sharing of grants
> Hi, > > At 02:47 -0500 on 08 Dec (1323312439), Andres Lagar-Cavilla wrote: >> + if ( >> XEN_DOMCTL_MEM_SHARING_FIELD_IS_GREF(mec->u.share.client_gfn) ) >> + { >> + grant_ref_t gref = (grant_ref_t) >> + >> (XEN_DOMCTL_MEM_SHARING_FIELD_GET_GREF( >> + mec->u.share.client_gfn)); >> + if ( (!source_is_gref) || >> + (mem_sharing_gref_to_gfn(cd, gref, &cgfn) < 0) ) >> + { >> + put_domain(cd); >> + return -EINVAL; >> + } >> + } else { >> + if ( source_is_gref ) >> + { >> + put_domain(cd); >> + return -EINVAL; >> + } > > Why can''t one input be a grant and the other not? If there''s a problem > with that it should probably be documented in the hypercall interface. >No particular problem. The gref use case is the current target of tools/memshr, which always does gref-to-gref sharing. I imagine gref-to-gfn sharing to be extremely unlikely. But there is no need to disallow it. Andres> Cheers, > > Tim. >
Andres Lagar-Cavilla
2011-Dec-09 03:01 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
> On 08/12/2011 22:38, "Tim Deegan" <tim@xen.org> wrote: > >> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote: >>> This is necessary for a new consumer of page_lock/unlock to follow in >>> the series. >>> >>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >> >> Nak, I''m afraid. >> >> These were OK as local functions but if they''re going to be made >> generally visible, they need clear comments describing what this >> locking protects and what the discipline is for avoiding deadlocks. >> >> Perhaps Jan or Keir can supply appropriate words. The locking was >> introduce in this cset: > > It''s Jan''s work originally, but the basic intention of page_lock is to > serialise pte updates. To aid with this, a page''s type cannot change while > its lock is held.That''s definitely a property I want to leverage.> No lock nests inside a page lock (not even other page > locks) so there is no deadlock risk.There''s no way to not nest when sharing two pages, but I always make sure I lock in ascending order. Thanks, Andres> >> changeset: 17846:09dd5999401b >> user: Keir Fraser <keir.fraser@citrix.com> >> date: Thu Jun 12 18:14:00 2008 +0100 >> files: xen/arch/x86/domain.c xen/arch/x86/domain_build.c >> xen/arch/x86/mm.c >> description: >> x86: remove use of per-domain lock from page table entry handling >> >> This change results in a 5% performance improvement for kernel >> builds >> on dual-socket quad-core systems (which is what I used for reference >> for both 32- and 64-bit). Along with that, the amount of time >> reported >> as spent in the kernel gets reduced by almost 25% (the fraction of >> time spent in the kernel is generally reported significantly higher >> under Xen than with a native kernel). >> >> Signed-off-by: Jan Beulich <jbeulich@novell.com> >> Signed-off-by: Keir Fraser <keir.fraser@citrix.com> >> >> Tim. > > >
Keir Fraser
2011-Dec-09 08:17 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
On 09/12/2011 03:01, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> wrote:>> On 08/12/2011 22:38, "Tim Deegan" <tim@xen.org> wrote: >> >>> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote: >>>> This is necessary for a new consumer of page_lock/unlock to follow in >>>> the series. >>>> >>>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>> >>> Nak, I''m afraid. >>> >>> These were OK as local functions but if they''re going to be made >>> generally visible, they need clear comments describing what this >>> locking protects and what the discipline is for avoiding deadlocks. >>> >>> Perhaps Jan or Keir can supply appropriate words. The locking was >>> introduce in this cset: >> >> It''s Jan''s work originally, but the basic intention of page_lock is to >> serialise pte updates. To aid with this, a page''s type cannot change while >> its lock is held. > That''s definitely a property I want to leverage. > >> No lock nests inside a page lock (not even other page >> locks) so there is no deadlock risk. > There''s no way to not nest when sharing two pages, but I always make sure > I lock in ascending order.The fact there is currently no nesting gives you some freedom. Ordered locking of other page locks is obviously going to be safe. So is taking a page lock inside any other lock. Taking some other lock inside a page lock is all that needs care, but there probably aren''t many locks that currently can have page locks nested inside them (and hence you would risk deadlock by nesting the other way). -- Keir> Thanks, > Andres >> >>> changeset: 17846:09dd5999401b >>> user: Keir Fraser <keir.fraser@citrix.com> >>> date: Thu Jun 12 18:14:00 2008 +0100 >>> files: xen/arch/x86/domain.c xen/arch/x86/domain_build.c >>> xen/arch/x86/mm.c >>> description: >>> x86: remove use of per-domain lock from page table entry handling >>> >>> This change results in a 5% performance improvement for kernel >>> builds >>> on dual-socket quad-core systems (which is what I used for reference >>> for both 32- and 64-bit). Along with that, the amount of time >>> reported >>> as spent in the kernel gets reduced by almost 25% (the fraction of >>> time spent in the kernel is generally reported significantly higher >>> under Xen than with a native kernel). >>> >>> Signed-off-by: Jan Beulich <jbeulich@novell.com> >>> Signed-off-by: Keir Fraser <keir.fraser@citrix.com> >>> >>> Tim. >> >> >> > >
Jan Beulich
2011-Dec-09 08:51 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
>>> On 09.12.11 at 03:54, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> wrote: >> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote: >>> This is necessary for a new consumer of page_lock/unlock to follow in >>> the series. >>> >>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >> >> Nak, I''m afraid. >> >> These were OK as local functions but if they''re going to be made >> generally visible, they need clear comments describing what this >> locking protects and what the discipline is for avoiding deadlocks. > > How about something along the lines of > "page_lock() is used for two purposes: pte serialization, and memory > sharing. All users of page lock for pte serialization live in mm.c, use it > to lock a page table page during pte updates, do not take other locks > within the critical section delimited by page_lock/unlock, and perform no > nesting. All users of page lock for memory sharing live in > mm/mem_sharing.c. For memory sharing, nesting may happen when sharing (and > locking) two pages -- deadlock is avoided by locking pages in increasing > order. Memory sharing may take the p2m_lock within a page_lock/unlock > critical section. These two users (pte serialization and memory sharing) > should never collide, as long as page table pages are properly unshared > prior to updating."This would seem to me like very undesirable lock ordering - a very coarse (per-domain) lock taken inside a very fine grained (per-page) one.> Now those are all pretty words, but here are the two things I (think I) > need to do: > - Prior to updating pte''s, we do get_gfn on the page table page. We should > be using get_gfn_unshare. Regardless of this patch. It''s likely never > going to trigger an actual unshare, yet better safe than sorry.Does memory sharing work on pv domains at all?> - I can wrap uses of page_lock in mm sharing in an "external" > order-enforcing construct from mm-locks.h. And use that to scream deadlock > between page_lock and p2m_lock. > > The code that actually uses page_lock()s in the memory sharing code can be > found in "[PATCH] x86/mm: Eliminate global lock in mem sharing code". It > already orders locking of individual pages in ascending order.It should be this patch to make the functions externally visible, not a separate one (or at the very minimum the two ought to be in the same series, back to back). Which is not to say that I''m fully convinced this is a good idea. Jan
Ian Jackson
2011-Dec-09 09:59 UTC
Re: [PATCH 06 of 18] Tools: Update libxc mem sharing interface
Andres Lagar-Cavilla writes ("[PATCH 06 of 18] Tools: Update libxc mem sharing interface"):> tools/libxc/xc_memshr.c | 19 ++++++++++++++++++- > tools/libxc/xenctrl.h | 7 ++++++- > 2 files changed, 24 insertions(+), 2 deletions(-) > > > Previosuly, the mem sharing code would return an opaque handle > to index shared pages (and nominees) in its global hash table. > By removing the hash table, the handle becomes a version, to > avoid sharing a stale version of a page. Thus, libxc wrappers > need to be updated accordingly. > > Signed-off-by: Adin Scannell <adin@scannell.ca> > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> (assuming the hypervisor changes are OK, obviously) Ian.
Ian Jackson
2011-Dec-09 09:59 UTC
Re: [PATCH 07 of 18] Tools: Update memshr tool to use new sharing API
Andres Lagar-Cavilla writes ("[PATCH 07 of 18] Tools: Update memshr tool to use new sharing API"):> The only (in-tree, that we know of) consumer of the mem sharing API > is the memshr tool (conditionally linked into blktap2). Update it to > use the new API. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson
2011-Dec-09 10:01 UTC
Re: [PATCH 05 of 18] Tools: Do not assume sharing target is dom0 in libxc wrappers
Andres Lagar-Cavilla writes ("[PATCH 05 of 18] Tools: Do not assume sharing target is dom0 in libxc wrappers"):> The libxc wrapper that shares two pages always assumed one > of the pages was mapped by dom0. Expand the API to specify > both sharing domains. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson
2011-Dec-09 10:02 UTC
Re: [PATCH 11 of 18] Tools: Allow libxl/xl to expose the total number of shared frames and space saved
Andres Lagar-Cavilla writes ("[PATCH 11 of 18] Tools: Allow libxl/xl to expose the total number of shared frames and space saved"):> diff -r 6f489a294a76 -r cde3529132c1 tools/libxl/xl_cmdimpl.c > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -38,6 +38,7 @@ > > #include "libxl.h" > #include "libxl_utils.h" > +#include "libxl_internal.h"No, this is not correct. xl should not include libxl_internal.h. Nor should it make libxc calls directly. Ian.
Ian Jackson
2011-Dec-09 10:08 UTC
Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages
Andres Lagar-Cavilla writes ("[PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages"):> +static void sharing(int totals, const libxl_dominfo *info, int nb_domain) > +{ > + int i; > + > + printf("Name ID Mem Shared\n"); > + > + for (i = 0; i < nb_domain; i++) { > + char *domname; > + unsigned shutdown_reason; > + domname = libxl_domid_to_name(ctx, info[i].domid); > + shutdown_reason = info[i].shutdown ? info[i].shutdown_reason : 0; > + printf("%-40s %5d %5lu %5lu\n", > + domname, > + info[i].domid, > + (unsigned long) (info[i].current_memkb / 1024), > + (unsigned long) (info[i].shared_memkb / 1024)); > + free(domname); > + } > + > + if (totals) > + { > + /* To be added with a future patch. */ > + } > +}Is this analogous to an "xm sharing" command ? Perhaps we should try to integrate this information in the output of "xl list" (although we do have some backward compatibility issues there). Maybe we need to invent "xl llist" or "xl list -v" or something. Ian.
Ian Campbell
2011-Dec-09 10:10 UTC
Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages
On Thu, 2011-12-08 at 07:47 +0000, Andres Lagar-Cavilla wrote:> diff -r 8d2a8094ace5 -r 24d514cd4dee tools/libxl/xl_cmdtable.c > --- a/tools/libxl/xl_cmdtable.c > +++ b/tools/libxl/xl_cmdtable.c > @@ -189,6 +189,12 @@ struct cmd_spec cmd_table[] = { > "Get information about Xen host", > "-n, --numa List host NUMA topology information", > }, > + { "sharing", > + &main_sharing, 0, > + "Get information about page sharing", > + "[options] [Domain]", > + "-t, --totals Include host totals in the output", > + }, > { "sched-credit", > &main_sched_credit, 0, > "Get/set credit scheduler parameters",Please also patch docs/man/xl.pod.1 when adding new commands. Ian.
Ian Jackson
2011-Dec-09 11:29 UTC
Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages
Ian Campbell writes ("Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages"):> Please also patch docs/man/xl.pod.1 when adding new commands.Oh, good point, yes. Ian.
Andres Lagar-Cavilla
2011-Dec-09 14:43 UTC
Re: [PATCH 08 of 18] Tools: Add a sharing command to xl for information about shared pages
> Andres Lagar-Cavilla writes ("[PATCH 08 of 18] Tools: Add a sharing > command to xl for information about shared pages"): >> +static void sharing(int totals, const libxl_dominfo *info, int >> nb_domain) >> +{ >> + int i; >> + >> + printf("Name ID Mem >> Shared\n"); >> + >> + for (i = 0; i < nb_domain; i++) { >> + char *domname; >> + unsigned shutdown_reason; >> + domname = libxl_domid_to_name(ctx, info[i].domid); >> + shutdown_reason = info[i].shutdown ? info[i].shutdown_reason : >> 0; >> + printf("%-40s %5d %5lu %5lu\n", >> + domname, >> + info[i].domid, >> + (unsigned long) (info[i].current_memkb / 1024), >> + (unsigned long) (info[i].shared_memkb / 1024)); >> + free(domname); >> + } >> + >> + if (totals) >> + { >> + /* To be added with a future patch. */ >> + } >> +} > > Is this analogous to an "xm sharing" command ? > > Perhaps we should try to integrate this information in the output of > "xl list" (although we do have some backward compatibility issues > there). > > Maybe we need to invent "xl llist" or "xl list -v" or something.I seem to recall that several weeks ago, when Adin first posted this patch, he was asked to not break xl list compatibility with xm list. Hence the separate command. Will also try to add doc to xl.pod.1 Thanks Andres> > Ian. >
Andres Lagar-Cavilla
2011-Dec-09 14:47 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
> On 09/12/2011 03:01, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> > wrote: > >>> On 08/12/2011 22:38, "Tim Deegan" <tim@xen.org> wrote: >>> >>>> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote: >>>>> This is necessary for a new consumer of page_lock/unlock to follow in >>>>> the series. >>>>> >>>>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>>> >>>> Nak, I''m afraid. >>>> >>>> These were OK as local functions but if they''re going to be made >>>> generally visible, they need clear comments describing what this >>>> locking protects and what the discipline is for avoiding deadlocks. >>>> >>>> Perhaps Jan or Keir can supply appropriate words. The locking was >>>> introduce in this cset: >>> >>> It''s Jan''s work originally, but the basic intention of page_lock is to >>> serialise pte updates. To aid with this, a page''s type cannot change >>> while >>> its lock is held. >> That''s definitely a property I want to leverage. >> >>> No lock nests inside a page lock (not even other page >>> locks) so there is no deadlock risk. >> There''s no way to not nest when sharing two pages, but I always make >> sure >> I lock in ascending order. > > The fact there is currently no nesting gives you some freedom. Ordered > locking of other page locks is obviously going to be safe. So is taking a > page lock inside any other lock. Taking some other lock inside a page lock > is all that needs care, but there probably aren''t many locks that > currently > can have page locks nested inside them (and hence you would risk deadlock > by > nesting the other way).For now, p2m_lock will be taken inside page_lock, to type p2m entries or make them point to the shared page. Later, when p2m lookups become synchronized, p2m_lock (or similar) will be taken before page_lock, in get_gfn* -- and that will be better. Either way, I will use mm-locks.h to make sure that if there is any chance of deadlock the hypervisor screams. Thanks Andres> > -- Keir > >> Thanks, >> Andres >>> >>>> changeset: 17846:09dd5999401b >>>> user: Keir Fraser <keir.fraser@citrix.com> >>>> date: Thu Jun 12 18:14:00 2008 +0100 >>>> files: xen/arch/x86/domain.c xen/arch/x86/domain_build.c >>>> xen/arch/x86/mm.c >>>> description: >>>> x86: remove use of per-domain lock from page table entry handling >>>> >>>> This change results in a 5% performance improvement for kernel >>>> builds >>>> on dual-socket quad-core systems (which is what I used for >>>> reference >>>> for both 32- and 64-bit). Along with that, the amount of time >>>> reported >>>> as spent in the kernel gets reduced by almost 25% (the fraction of >>>> time spent in the kernel is generally reported significantly >>>> higher >>>> under Xen than with a native kernel). >>>> >>>> Signed-off-by: Jan Beulich <jbeulich@novell.com> >>>> Signed-off-by: Keir Fraser <keir.fraser@citrix.com> >>>> >>>> Tim. >>> >>> >>> >> >> > > >
Andres Lagar-Cavilla
2011-Dec-09 14:53 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
>>>> On 09.12.11 at 03:54, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> >>>> wrote: >>> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote: >>>> This is necessary for a new consumer of page_lock/unlock to follow in >>>> the series. >>>> >>>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>> >>> Nak, I''m afraid. >>> >>> These were OK as local functions but if they''re going to be made >>> generally visible, they need clear comments describing what this >>> locking protects and what the discipline is for avoiding deadlocks. >> >> How about something along the lines of >> "page_lock() is used for two purposes: pte serialization, and memory >> sharing. All users of page lock for pte serialization live in mm.c, use >> it >> to lock a page table page during pte updates, do not take other locks >> within the critical section delimited by page_lock/unlock, and perform >> no >> nesting. All users of page lock for memory sharing live in >> mm/mem_sharing.c. For memory sharing, nesting may happen when sharing >> (and >> locking) two pages -- deadlock is avoided by locking pages in increasing >> order. Memory sharing may take the p2m_lock within a page_lock/unlock >> critical section. These two users (pte serialization and memory sharing) >> should never collide, as long as page table pages are properly unshared >> prior to updating." > > This would seem to me like very undesirable lock ordering - a very > coarse (per-domain) lock taken inside a very fine grained (per-page) > one.I''m not sure I follow. Unshare would do its work, and then pte serialization would start. The two pieces of code will be disjoint, locking-wise. Now it is true that during unshare we need to take the p2m lock to change the p2m entry. That''s a very strong reason to make the p2m lock fine-grained. But I need to start somewhere, so I''m breaking up the global shr_lock first.> >> Now those are all pretty words, but here are the two things I (think I) >> need to do: >> - Prior to updating pte''s, we do get_gfn on the page table page. We >> should >> be using get_gfn_unshare. Regardless of this patch. It''s likely never >> going to trigger an actual unshare, yet better safe than sorry. > > Does memory sharing work on pv domains at all?Not. At. All :) I can _not_ add the unshare. It''s idempotent to pv.> >> - I can wrap uses of page_lock in mm sharing in an "external" >> order-enforcing construct from mm-locks.h. And use that to scream >> deadlock >> between page_lock and p2m_lock. >> >> The code that actually uses page_lock()s in the memory sharing code can >> be >> found in "[PATCH] x86/mm: Eliminate global lock in mem sharing code". It >> already orders locking of individual pages in ascending order. > > It should be this patch to make the functions externally visible, not a > separate one (or at the very minimum the two ought to be in the same > series, back to back). Which is not to say that I''m fully convinced this > is a good idea.Whichever you prefer. I''m of the mind of making shorter patches when possible, that do one thing, to ease readability. But I can collapse the two. Thanks Andres> > Jan > >
Tim Deegan
2011-Dec-09 14:57 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
At 18:54 -0800 on 08 Dec (1323370449), Andres Lagar-Cavilla wrote:> > At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote: > >> This is necessary for a new consumer of page_lock/unlock to follow in > >> the series. > >> > >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > > > Nak, I''m afraid. > > > > These were OK as local functions but if they''re going to be made > > generally visible, they need clear comments describing what this > > locking protects and what the discipline is for avoiding deadlocks. > > How about something along the lines of > "page_lock() is used for two purposes: pte serialization, and memory > sharing. All users of page lock for pte serialization live in mm.c, use it > to lock a page table page during pte updates, do not take other locks > within the critical section delimited by page_lock/unlock, and perform no > nesting. All users of page lock for memory sharing live in > mm/mem_sharing.c. For memory sharing, nesting may happen when sharing (and > locking) two pages -- deadlock is avoided by locking pages in increasing > order. Memory sharing may take the p2m_lock within a page_lock/unlock > critical section. These two users (pte serialization and memory sharing) > should never collide, as long as page table pages are properly unshared > prior to updating."Thanks. Extracting from that and from Keir''s response: It serves both as a spinlock and to exclude any to the page-type of the page in question (by causing the get_page_type() functions to spin). What it currently protects is all modifications to pages that have pagetable type. This serialises PV PTE validations, both against other validations of the same PTE and against concurrent changes of the enclosing page''s type. Your planned use is to protect updates to the page-sharing state associated with a page. Can you be clear about what exactly is protected? The proposed locking discipline is that - page locks must be taken in increasing order of MFN, yes? - and that you must always take page locks before the p2m lock? Is that about right? Tim.
Andres Lagar-Cavilla
2011-Dec-09 14:59 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
> At 18:54 -0800 on 08 Dec (1323370449), Andres Lagar-Cavilla wrote: >> > At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote: >> >> This is necessary for a new consumer of page_lock/unlock to follow in >> >> the series. >> >> >> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >> > >> > Nak, I''m afraid. >> > >> > These were OK as local functions but if they''re going to be made >> > generally visible, they need clear comments describing what this >> > locking protects and what the discipline is for avoiding deadlocks. >> >> How about something along the lines of >> "page_lock() is used for two purposes: pte serialization, and memory >> sharing. All users of page lock for pte serialization live in mm.c, use >> it >> to lock a page table page during pte updates, do not take other locks >> within the critical section delimited by page_lock/unlock, and perform >> no >> nesting. All users of page lock for memory sharing live in >> mm/mem_sharing.c. For memory sharing, nesting may happen when sharing >> (and >> locking) two pages -- deadlock is avoided by locking pages in increasing >> order. Memory sharing may take the p2m_lock within a page_lock/unlock >> critical section. These two users (pte serialization and memory sharing) >> should never collide, as long as page table pages are properly unshared >> prior to updating." > > Thanks. Extracting from that and from Keir''s response: > > It serves both as a spinlock and to exclude any to the page-type of the > page in question (by causing the get_page_type() functions to spin). > > What it currently protects is all modifications to pages that have > pagetable type. This serialises PV PTE validations, both against other > validations of the same PTE and against concurrent changes of the > enclosing page''s type. > > Your planned use is to protect updates to the page-sharing state > associated with a page. Can you be clear about what exactly is protected?"Hanging" from a shared page, there is a list of all the gfn''s it backs. These are (gfn,domain) tuples. So every time we share or unshare, we need protected access to that list.> > The proposed locking discipline is that > - page locks must be taken in increasing order of MFN, yes? > - and that you must always take page locks before the p2m lock?Yes and yes> > Is that about right?Indeed Thanks Andres> > Tim. >
Keir Fraser
2011-Dec-09 15:02 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
On 09/12/2011 14:57, "Tim Deegan" <tim@xen.org> wrote:> At 18:54 -0800 on 08 Dec (1323370449), Andres Lagar-Cavilla wrote: >>> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote: >>>> This is necessary for a new consumer of page_lock/unlock to follow in >>>> the series. >>>> >>>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>> >>> Nak, I''m afraid. >>> >>> These were OK as local functions but if they''re going to be made >>> generally visible, they need clear comments describing what this >>> locking protects and what the discipline is for avoiding deadlocks. >> >> How about something along the lines of >> "page_lock() is used for two purposes: pte serialization, and memory >> sharing. All users of page lock for pte serialization live in mm.c, use it >> to lock a page table page during pte updates, do not take other locks >> within the critical section delimited by page_lock/unlock, and perform no >> nesting. All users of page lock for memory sharing live in >> mm/mem_sharing.c. For memory sharing, nesting may happen when sharing (and >> locking) two pages -- deadlock is avoided by locking pages in increasing >> order. Memory sharing may take the p2m_lock within a page_lock/unlock >> critical section. These two users (pte serialization and memory sharing) >> should never collide, as long as page table pages are properly unshared >> prior to updating." > > Thanks. Extracting from that and from Keir''s response: > > It serves both as a spinlock and to exclude any to the page-type of the > page in question (by causing the get_page_type() functions to spin).It does not cause the get_page_type() functions to spin. An attempt to get another reference to the current type will succeed; an attempt to change the type will immediately fail. From the p.o.v. of the type-tracking logic, page_lock() simply takes a reference to the current type. -- Keir> What it currently protects is all modifications to pages that have > pagetable type. This serialises PV PTE validations, both against other > validations of the same PTE and against concurrent changes of the > enclosing page''s type. > > Your planned use is to protect updates to the page-sharing state > associated with a page. Can you be clear about what exactly is protected? > > The proposed locking discipline is that > - page locks must be taken in increasing order of MFN, yes? > - and that you must always take page locks before the p2m lock? > > Is that about right? > > Tim.
Jan Beulich
2011-Dec-09 15:06 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
>>> On 09.12.11 at 15:53, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> wrote: >>>>> On 09.12.11 at 03:54, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> >>>>> wrote: >>>> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote: >>>>> This is necessary for a new consumer of page_lock/unlock to follow in >>>>> the series. >>>>> >>>>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>>> >>>> Nak, I''m afraid. >>>> >>>> These were OK as local functions but if they''re going to be made >>>> generally visible, they need clear comments describing what this >>>> locking protects and what the discipline is for avoiding deadlocks. >>> >>> How about something along the lines of >>> "page_lock() is used for two purposes: pte serialization, and memory >>> sharing. All users of page lock for pte serialization live in mm.c, use >>> it >>> to lock a page table page during pte updates, do not take other locks >>> within the critical section delimited by page_lock/unlock, and perform >>> no >>> nesting. All users of page lock for memory sharing live in >>> mm/mem_sharing.c. For memory sharing, nesting may happen when sharing >>> (and >>> locking) two pages -- deadlock is avoided by locking pages in increasing >>> order. Memory sharing may take the p2m_lock within a page_lock/unlock >>> critical section. These two users (pte serialization and memory sharing) >>> should never collide, as long as page table pages are properly unshared >>> prior to updating." >> >> This would seem to me like very undesirable lock ordering - a very >> coarse (per-domain) lock taken inside a very fine grained (per-page) >> one. > I''m not sure I follow. Unshare would do its work, and then pte > serialization would start. The two pieces of code will be disjoint, > locking-wise.But your original mail said "Memory sharing may take the p2m_lock within a page_lock/unlock critical section" - see above. That''s what I''m referring to.> Now it is true that during unshare we need to take the p2m lock to change > the p2m entry. That''s a very strong reason to make the p2m lock > fine-grained. But I need to start somewhere, so I''m breaking up the global > shr_lock first.I don''t really think that it''ll be reasonable to split up the p2m lock.>>> Now those are all pretty words, but here are the two things I (think I) >>> need to do: >>> - Prior to updating pte''s, we do get_gfn on the page table page. We >>> should >>> be using get_gfn_unshare. Regardless of this patch. It''s likely never >>> going to trigger an actual unshare, yet better safe than sorry. >> >> Does memory sharing work on pv domains at all? > Not. At. All :) > > I can _not_ add the unshare. It''s idempotent to pv.Perhaps I should have clarified why I was asking: The pte handling is a pv-only thing, and if memory sharing is hvm only, then the two can''t ever conflict.>>> - I can wrap uses of page_lock in mm sharing in an "external" >>> order-enforcing construct from mm-locks.h. And use that to scream >>> deadlock >>> between page_lock and p2m_lock. >>> >>> The code that actually uses page_lock()s in the memory sharing code can >>> be >>> found in "[PATCH] x86/mm: Eliminate global lock in mem sharing code". It >>> already orders locking of individual pages in ascending order. >> >> It should be this patch to make the functions externally visible, not a >> separate one (or at the very minimum the two ought to be in the same >> series, back to back). Which is not to say that I''m fully convinced this >> is a good idea. > Whichever you prefer. I''m of the mind of making shorter patches when > possible, that do one thing, to ease readability. But I can collapse the > two.In quite a few (recent) cases your patches did something where the user of the change wasn''t obvious at all (in some cases - I tried to point these out - there was no user even at the end of a series). While I agree that shorter patches are easier to review, trivial changes like the one here should imo really be logically grouped with what requires them. Jan
Andres Lagar-Cavilla
2011-Dec-09 17:34 UTC
Re: [PATCH 12 of 18] x86/mm: Make page_lock/unlock() in arch/x86/mm.c externally callable
>>>> On 09.12.11 at 15:53, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> >>>> wrote: >>>>>> On 09.12.11 at 03:54, "Andres Lagar-Cavilla" >>>>>> <andres@lagarcavilla.org> >>>>>> wrote: >>>>> At 02:47 -0500 on 08 Dec (1323312447), Andres Lagar-Cavilla wrote: >>>>>> This is necessary for a new consumer of page_lock/unlock to follow >>>>>> in >>>>>> the series. >>>>>> >>>>>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>>>> >>>>> Nak, I''m afraid. >>>>> >>>>> These were OK as local functions but if they''re going to be made >>>>> generally visible, they need clear comments describing what this >>>>> locking protects and what the discipline is for avoiding deadlocks. >>>> >>>> How about something along the lines of >>>> "page_lock() is used for two purposes: pte serialization, and memory >>>> sharing. All users of page lock for pte serialization live in mm.c, >>>> use >>>> it >>>> to lock a page table page during pte updates, do not take other locks >>>> within the critical section delimited by page_lock/unlock, and perform >>>> no >>>> nesting. All users of page lock for memory sharing live in >>>> mm/mem_sharing.c. For memory sharing, nesting may happen when sharing >>>> (and >>>> locking) two pages -- deadlock is avoided by locking pages in >>>> increasing >>>> order. Memory sharing may take the p2m_lock within a page_lock/unlock >>>> critical section. These two users (pte serialization and memory >>>> sharing) >>>> should never collide, as long as page table pages are properly >>>> unshared >>>> prior to updating." >>> >>> This would seem to me like very undesirable lock ordering - a very >>> coarse (per-domain) lock taken inside a very fine grained (per-page) >>> one. >> I''m not sure I follow. Unshare would do its work, and then pte >> serialization would start. The two pieces of code will be disjoint, >> locking-wise. > > But your original mail said "Memory sharing may take the p2m_lock > within a page_lock/unlock critical section" - see above. That''s what > I''m referring to. > >> Now it is true that during unshare we need to take the p2m lock to >> change >> the p2m entry. That''s a very strong reason to make the p2m lock >> fine-grained. But I need to start somewhere, so I''m breaking up the >> global >> shr_lock first. > > I don''t really think that it''ll be reasonable to split up the p2m lock. > >>>> Now those are all pretty words, but here are the two things I (think >>>> I) >>>> need to do: >>>> - Prior to updating pte''s, we do get_gfn on the page table page. We >>>> should >>>> be using get_gfn_unshare. Regardless of this patch. It''s likely never >>>> going to trigger an actual unshare, yet better safe than sorry. >>> >>> Does memory sharing work on pv domains at all? >> Not. At. All :) >> >> I can _not_ add the unshare. It''s idempotent to pv. > > Perhaps I should have clarified why I was asking: The pte handling is > a pv-only thing, and if memory sharing is hvm only, then the two can''t > ever conflict.Grant mapping uses the page_lock. I believe there is work trying to make pv backends work in hvm? If so, best to add unshare of the page table page now. Should a free-wheeling user-space sharer try to share page table pages.... Andres> >>>> - I can wrap uses of page_lock in mm sharing in an "external" >>>> order-enforcing construct from mm-locks.h. And use that to scream >>>> deadlock >>>> between page_lock and p2m_lock. >>>> >>>> The code that actually uses page_lock()s in the memory sharing code >>>> can >>>> be >>>> found in "[PATCH] x86/mm: Eliminate global lock in mem sharing code". >>>> It >>>> already orders locking of individual pages in ascending order. >>> >>> It should be this patch to make the functions externally visible, not a >>> separate one (or at the very minimum the two ought to be in the same >>> series, back to back). Which is not to say that I''m fully convinced >>> this >>> is a good idea. >> Whichever you prefer. I''m of the mind of making shorter patches when >> possible, that do one thing, to ease readability. But I can collapse the >> two. > > In quite a few (recent) cases your patches did something where the user > of the change wasn''t obvious at all (in some cases - I tried to point > these > out - there was no user even at the end of a series). While I agree that > shorter patches are easier to review, trivial changes like the one here > should imo really be logically grouped with what requires them. > > Jan >