Stefano Stabellini
2013-Aug-15 11:17 UTC
[PATCH v4 0/4] introduce XENMEM_exchange_and_pin and XENMEM_unpin
Hi all, this patch series introduces two new hypercalls to allow autotranslate guests to allocate a contiguous buffer in machine addresses. The XENMEM_exchange_and_pin returns the mfns and makes sure to pin the pages so that the hypervisor won''t change their p2m mappings while in use. XENMEM_unpin simply unpins the pages. Cheers, Stefano Changes in v4: - move steal_page to common code; - introduce a generic p2m walker and use it in p2m_lookup, guest_physmap_pin_range and guest_physmap_unpin_range; - return -EINVAL when the P2M_DMA_PIN check fails; - change the printk into a gdprintk; - add a comment on what type of page can be pinned; - rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin; - rename XENMEM_get_dma_buf to XENMEM_unpin; - move the pinning before we copy back the mfn to the guest; - propagate errors returned by guest_physmap_pin_range; - use xen_memory_exchange_t as parameter for XENMEM_exchange_and_pin; - use an unsigned iterator in unpin; - improve the documentation of the new hypercalls; - add a note about out.address_bits for XENMEM_exchange. Changes in v3: - implement guest_physmap_pin_range and guest_physmap_unpin_range on ARM; - implement XENMEM_put_dma_buf. Changes in v2: - actually don''t print the warning more than once. Stefano Stabellini (4): xen/arm: implement steal_page xen/arm: introduce a generic p2m walker and use it in p2m_lookup xen: implement guest_physmap_(un)pin_range xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin xen/arch/arm/mm.c | 6 -- xen/arch/arm/p2m.c | 147 +++++++++++++++++++++++++++++++++++------- xen/arch/x86/mm.c | 49 -------------- xen/common/memory.c | 116 +++++++++++++++++++++++++++++++++- xen/include/asm-arm/mm.h | 6 +- xen/include/asm-arm/page.h | 7 ++- xen/include/asm-x86/mm.h | 2 - xen/include/asm-x86/p2m.h | 12 ++++ xen/include/public/memory.h | 38 +++++++++++ xen/include/xen/mm.h | 1 + 10 files changed, 297 insertions(+), 87 deletions(-) git://xenbits.xen.org/people/sstabellini/xen-unstable.git exchange_and_pin_3
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Changes in v4: - move steal_page to common code. --- xen/arch/arm/mm.c | 6 ----- xen/arch/x86/mm.c | 49 ---------------------------------------------- xen/common/memory.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++ xen/include/asm-arm/mm.h | 2 - xen/include/asm-x86/mm.h | 2 - xen/include/xen/mm.h | 1 + 6 files changed, 50 insertions(+), 59 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index f301e65..b26ca3e 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -748,12 +748,6 @@ int donate_page(struct domain *d, struct page_info *page, unsigned int memflags) return -ENOSYS; } -int steal_page( - struct domain *d, struct page_info *page, unsigned int memflags) -{ - return -1; -} - int page_is_ram_type(unsigned long mfn, unsigned long mem_type) { ASSERT(0); diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index e7f0e13..5bcbb52 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4160,55 +4160,6 @@ int donate_page( return -1; } -int steal_page( - struct domain *d, struct page_info *page, unsigned int memflags) -{ - unsigned long x, y; - bool_t drop_dom_ref = 0; - - spin_lock(&d->page_alloc_lock); - - if ( is_xen_heap_page(page) || (page_get_owner(page) != d) ) - goto fail; - - /* - * We require there is just one reference (PGC_allocated). We temporarily - * drop this reference now so that we can safely swizzle the owner. - */ - y = page->count_info; - do { - x = y; - if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) ) - goto fail; - y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask); - } while ( y != x ); - - /* Swizzle the owner then reinstate the PGC_allocated reference. */ - page_set_owner(page, NULL); - y = page->count_info; - do { - x = y; - BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated); - } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x ); - - /* Unlink from original owner. */ - if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) ) - drop_dom_ref = 1; - page_list_del(page, &d->page_list); - - spin_unlock(&d->page_alloc_lock); - if ( unlikely(drop_dom_ref) ) - put_domain(d); - return 0; - - fail: - spin_unlock(&d->page_alloc_lock); - MEM_LOG("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%" PRtype_info, - (void *)page_to_mfn(page), d, d->domain_id, - page_get_owner(page), page->count_info, page->u.inuse.type_info); - return -1; -} - static int __do_update_va_mapping( unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner) { diff --git a/xen/common/memory.c b/xen/common/memory.c index 50b740f..422b274 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -230,6 +230,55 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) return 1; } +int steal_page( + struct domain *d, struct page_info *page, unsigned int memflags) +{ + unsigned long x, y; + bool_t drop_dom_ref = 0; + + spin_lock(&d->page_alloc_lock); + + if ( is_xen_heap_page(page) || (page_get_owner(page) != d) ) + goto fail; + + /* + * We require there is just one reference (PGC_allocated). We temporarily + * drop this reference now so that we can safely swizzle the owner. + */ + y = page->count_info; + do { + x = y; + if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) ) + goto fail; + y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask); + } while ( y != x ); + + /* Swizzle the owner then reinstate the PGC_allocated reference. */ + page_set_owner(page, NULL); + y = page->count_info; + do { + x = y; + BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated); + } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x ); + + /* Unlink from original owner. */ + if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) ) + drop_dom_ref = 1; + page_list_del(page, &d->page_list); + + spin_unlock(&d->page_alloc_lock); + if ( unlikely(drop_dom_ref) ) + put_domain(d); + return 0; + + fail: + spin_unlock(&d->page_alloc_lock); + printk("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%lx\n", + (void *)page_to_mfn(page), d, d->domain_id, + page_get_owner(page), page->count_info, page->u.inuse.type_info); + return -1; +} + static void decrease_reservation(struct memop_args *a) { unsigned long i, j; diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 5e7c5a3..df6fea9 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -297,8 +297,6 @@ static inline int relinquish_shared_pages(struct domain *d) /* Arch-specific portion of memory_op hypercall. */ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg); -int steal_page( - struct domain *d, struct page_info *page, unsigned int memflags); int donate_page( struct domain *d, struct page_info *page, unsigned int memflags); diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index 213fc9c..65485ae 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -566,8 +566,6 @@ long subarch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg); int compat_arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void)); int compat_subarch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void)); -int steal_page( - struct domain *d, struct page_info *page, unsigned int memflags); int donate_page( struct domain *d, struct page_info *page, unsigned int memflags); diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h index 4f5795c..416d0e7 100644 --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -60,6 +60,7 @@ void destroy_xen_mappings(unsigned long v, unsigned long e); unsigned long domain_adjust_tot_pages(struct domain *d, long pages); int domain_set_outstanding_pages(struct domain *d, unsigned long pages); void get_outstanding_claims(uint64_t *free_pages, uint64_t *outstanding_pages); +int steal_page(struct domain *d, struct page_info *page, unsigned int memflags); /* Domain suballocator. These functions are *not* interrupt-safe.*/ void init_domheap_pages(paddr_t ps, paddr_t pe); -- 1.7.2.5
Stefano Stabellini
2013-Aug-15 11:18 UTC
[PATCH v4 2/4] xen/arm: introduce a generic p2m walker and use it in p2m_lookup
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- xen/arch/arm/p2m.c | 104 +++++++++++++++++++++++++++++++++++++++------------- 1 files changed, 78 insertions(+), 26 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 307c6d4..a77df9a 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -30,49 +30,101 @@ void p2m_load_VTTBR(struct domain *d) isb(); /* Ensure update is visible */ } -/* - * Lookup the MFN corresponding to a domain''s PFN. - * - * There are no processor functions to do a stage 2 only lookup therefore we - * do a a software walk. - */ -paddr_t p2m_lookup(struct domain *d, paddr_t paddr) +static int p2m_walker(struct domain *d, paddr_t paddr, unsigned int order, + int (*func)(lpae_t *pte, void *arg), void* arg) { + lpae_t *first = NULL, *second = NULL, *third = NULL; struct p2m_domain *p2m = &d->arch.p2m; - lpae_t pte, *first = NULL, *second = NULL, *third = NULL; - paddr_t maddr = INVALID_PADDR; + int rc = -EFAULT, i; + unsigned long cur_first_offset = ~0, cur_second_offset = ~0; spin_lock(&p2m->lock); first = __map_domain_page(p2m->first_level); - pte = first[first_table_offset(paddr)]; - if ( !pte.p2m.valid || !pte.p2m.table ) - goto done; + if ( !first || + !first[first_table_offset(paddr)].p2m.valid || + !first[first_table_offset(paddr)].p2m.table ) + goto err; + + for ( i = 0; i < (1UL << order); i++ ) + { + if ( cur_first_offset != first_table_offset(paddr) ) + { + if (second) unmap_domain_page(second); + second = map_domain_page(first[first_table_offset(paddr)].p2m.base); + cur_first_offset = first_table_offset(paddr); + } + if ( !second || + !second[second_table_offset(paddr)].p2m.valid || + !second[second_table_offset(paddr)].p2m.table ) + goto err; + + if ( cur_second_offset != second_table_offset(paddr) ) + { + if (third) unmap_domain_page(third); + third = map_domain_page(second[second_table_offset(paddr)].p2m.base); + cur_second_offset = second_table_offset(paddr); + } + if ( !third || + !third[third_table_offset(paddr)].p2m.valid ) + goto err; + + rc = func(&third[third_table_offset(paddr)], arg); + if ( rc != 0 ) + return rc; + + paddr += PAGE_SIZE; + } + + rc = 0; + +err: + if ( third ) unmap_domain_page(third); + if ( second ) unmap_domain_page(second); + if ( first ) unmap_domain_page(first); - second = map_domain_page(pte.p2m.base); - pte = second[second_table_offset(paddr)]; - if ( !pte.p2m.valid || !pte.p2m.table ) - goto done; + spin_unlock(&p2m->lock); + + return rc; +} + +struct p2m_lookup_t { + paddr_t paddr; + paddr_t maddr; +}; - third = map_domain_page(pte.p2m.base); - pte = third[third_table_offset(paddr)]; +static int _p2m_lookup(lpae_t *ptep, void *arg) +{ + lpae_t pte; + struct p2m_lookup_t *p2m = (struct p2m_lookup_t *)arg; + + pte = *ptep; /* This bit must be one in the level 3 entry */ if ( !pte.p2m.table ) pte.bits = 0; -done: if ( pte.p2m.valid ) - maddr = (pte.bits & PADDR_MASK & PAGE_MASK) | (paddr & ~PAGE_MASK); - - if (third) unmap_domain_page(third); - if (second) unmap_domain_page(second); - if (first) unmap_domain_page(first); + p2m->maddr = (pte.bits & PADDR_MASK & PAGE_MASK) | + (p2m->paddr & ~PAGE_MASK); + return 0; +} +/* + * Lookup the MFN corresponding to a domain''s PFN. + * + * There are no processor functions to do a stage 2 only lookup therefore we + * do a a software walk. + */ +paddr_t p2m_lookup(struct domain *d, paddr_t paddr) +{ + struct p2m_lookup_t p2m; + p2m.paddr = paddr; + p2m.maddr = INVALID_PADDR; - spin_unlock(&p2m->lock); + p2m_walker(d, paddr, 0, _p2m_lookup, &p2m); - return maddr; + return p2m.maddr; } int guest_physmap_mark_populate_on_demand(struct domain *d, -- 1.7.2.5
Stefano Stabellini
2013-Aug-15 11:18 UTC
[PATCH v4 3/4] xen: implement guest_physmap_(un)pin_range
guest_physmap_pin_range pins a range of guest pages so that their p2m mappings won''t be changed. guest_physmap_unpin_range unpins the previously pinned pages. The pinning is done using one of the spare bits in the p2m ptes. Use the newly introduce p2m_walker to implement the two functions on ARM. Provide empty stubs for x86. Changes in v4: - use p2m_walker to implement guest_physmap_pin_range and guest_physmap_unpin_range; - return -EINVAL when the P2M_DMA_PIN check fails; - change the printk into a gdprintk; - add a comment on what type of page can be pinned. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- xen/arch/arm/p2m.c | 45 ++++++++++++++++++++++++++++++++++++++++++++ xen/include/asm-arm/mm.h | 4 +++ xen/include/asm-arm/page.h | 7 +++++- xen/include/asm-x86/p2m.h | 12 +++++++++++ 4 files changed, 67 insertions(+), 1 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index a77df9a..c185328 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -127,6 +127,43 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr) return p2m.maddr; } + +static int _guest_physmap_pin_range(lpae_t *ptep, void *arg) +{ + lpae_t pte = *ptep; + if ( pte.p2m.avail & P2M_DMA_PIN ) + return -EINVAL; + pte.p2m.avail |= P2M_DMA_PIN; + write_pte(ptep, pte); + return 0; +} + +int guest_physmap_pin_range(struct domain *d, + xen_pfn_t gpfn, + unsigned int order) +{ + return p2m_walker(d, gpfn << PAGE_SHIFT, order, + _guest_physmap_pin_range, NULL); +} + +static int _guest_physmap_unpin_range(lpae_t *ptep, void *arg) +{ + lpae_t pte = *ptep; + if ( !pte.p2m.avail & P2M_DMA_PIN ) + return -EINVAL; + pte.p2m.avail &= ~P2M_DMA_PIN; + write_pte(ptep, pte); + return 0; +} + +int guest_physmap_unpin_range(struct domain *d, + xen_pfn_t gpfn, + unsigned int order) +{ + return p2m_walker(d, gpfn << PAGE_SHIFT, order, + _guest_physmap_unpin_range, NULL); +} + int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, unsigned int order) @@ -236,6 +273,14 @@ static int create_p2m_entries(struct domain *d, cur_second_offset = second_table_offset(addr); } + if ( third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN ) + { + rc = -EINVAL; + gdprintk(XENLOG_WARNING, "cannot change p2m mapping for paddr=%"PRIpaddr + " domid=%d, the page is pinned\n", addr, d->domain_id); + goto out; + } + flush = third[third_table_offset(addr)].p2m.valid; /* Allocate a new RAM page and attach */ diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index df6fea9..34167e6 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -317,6 +317,10 @@ void free_init_memory(void); int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, unsigned int order); +int guest_physmap_pin_range(struct domain *d, xen_pfn_t gpfn, + unsigned int order); +int guest_physmap_unpin_range(struct domain *d, xen_pfn_t gpfn, + unsigned int order); extern void put_page_type(struct page_info *page); static inline void put_page_and_type(struct page_info *page) diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h index 41e9eff..c08cfca 100644 --- a/xen/include/asm-arm/page.h +++ b/xen/include/asm-arm/page.h @@ -153,11 +153,16 @@ typedef struct { unsigned long hint:1; /* In a block of 16 contiguous entries */ unsigned long sbz2:1; unsigned long xn:1; /* eXecute-Never */ - unsigned long avail:4; /* Ignored by hardware */ + unsigned long avail:4; /* Ignored by hardware, see below */ unsigned long sbz1:5; } __attribute__((__packed__)) lpae_p2m_t; +/* Xen "avail" bits allocation in third level entries */ +#define P2M_DMA_PIN (1<<0) /* The page has been "pinned": the hypervisor + promises not to change the p2m mapping. + Only normal r/w guest RAM can be pinned. */ + /* * Walk is the common bits of p2m and pt entries which are needed to * simply walk the table (e.g. for debug). diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 43583b2..b08a722 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -492,6 +492,18 @@ void guest_physmap_remove_page(struct domain *d, /* Set a p2m range as populate-on-demand */ int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, unsigned int order); +static inline int guest_physmap_pin_range(struct domain *d, + xen_pfn_t gpfn, + unsigned int order) +{ + return -ENOSYS; +} +static inline int guest_physmap_unpin_range(struct domain *d, + xen_pfn_t gpfn, + unsigned int order) +{ + return -ENOSYS; +} /* Change types across all p2m entries in a domain */ void p2m_change_entry_type_global(struct domain *d, -- 1.7.2.5
Stefano Stabellini
2013-Aug-15 11:18 UTC
[PATCH v4 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
Introduce two new hypercalls XENMEM_exchange_and_pin and XENMEM_unpin. XENMEM_exchange_and_pin, it''s like XENMEM_exchange but it also pins the new pages: their p2m mapping are guaranteed not to be changed, until XENMEM_unpin is called. XENMEM_exchange_and_pin returns the DMA frame numbers of the new pages to the caller, even if it''s an autotranslate guest. The only effect of XENMEM_unpin is to "unpin" the previously pinned pages. Afterwards the p2m mappings can be transparently changed by the hypervisor as normal. The memory remains accessible from the guest. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Changes in v4: - rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin; - rename XENMEM_get_dma_buf to XENMEM_unpin; - move the pinning before we copy back the mfn to the guest; - propagate errors returned by guest_physmap_pin_range; - use xen_memory_exchange_t as parameter for XENMEM_exchange_and_pin; - use an unsigned iterator in unpin; - improve the documentation of the new hypercalls; - add a note about out.address_bits for XENMEM_exchange. --- xen/common/memory.c | 67 +++++++++++++++++++++++++++++++++++++++++- xen/include/public/memory.h | 38 ++++++++++++++++++++++++ 2 files changed, 103 insertions(+), 2 deletions(-) diff --git a/xen/common/memory.c b/xen/common/memory.c index 422b274..5d91d93 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -328,7 +328,7 @@ static void decrease_reservation(struct memop_args *a) a->nr_done = i; } -static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) +static long memory_exchange(int op, XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) { struct xen_memory_exchange exch; PAGE_LIST_HEAD(in_chunk_list); @@ -549,6 +549,15 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) { for ( k = 0; k < (1UL << exch.out.extent_order); k++ ) set_gpfn_from_mfn(mfn + k, gpfn + k); + } + if ( op == XENMEM_exchange_and_pin ) + { + rc = guest_physmap_pin_range(d, gpfn, exch.out.extent_order); + if ( rc ) + continue; + } + if ( op == XENMEM_exchange_and_pin || !paging_mode_translate(d) ) + { if ( __copy_to_guest_offset(exch.out.extent_start, (i << out_chunk_order) + j, &mfn, 1) ) @@ -591,6 +600,55 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) return rc; } +static long unpin(XEN_GUEST_HANDLE_PARAM(xen_unpin_t) arg) +{ + int rc; + xen_ulong_t i; + struct xen_unpin unpin; + xen_pfn_t gpfn; + struct domain *d; + + if ( copy_from_guest(&unpin, arg, 1) ) + return -EFAULT; + + /* Various sanity checks. */ + if ( /* Extent orders are sensible? */ + (unpin.in.extent_order > MAX_ORDER) || + /* Sizes of input list do not overflow a long? */ + ((~0UL >> unpin.in.extent_order) < unpin.in.nr_extents) ) + return -EFAULT; + + if ( !guest_handle_okay(unpin.in.extent_start, unpin.in.nr_extents) ) + return -EFAULT; + + d = rcu_lock_domain_by_any_id(unpin.in.domid); + if ( d == NULL ) + { + rc = -ESRCH; + goto fail; + } + + for ( i = 0; i < unpin.in.nr_extents; i++ ) + { + if ( unlikely(__copy_from_guest_offset( + &gpfn, unpin.in.extent_start, i, 1)) ) + { + rc = -EFAULT; + goto fail; + } + + rc = guest_physmap_unpin_range(d, gpfn, unpin.in.extent_order); + if ( rc ) + goto fail; + } + + rc = 0; + + fail: + rcu_unlock_domain(d); + return rc; +} + long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { struct domain *d; @@ -679,8 +737,13 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; + case XENMEM_exchange_and_pin: case XENMEM_exchange: - rc = memory_exchange(guest_handle_cast(arg, xen_memory_exchange_t)); + rc = memory_exchange(op, guest_handle_cast(arg, xen_memory_exchange_t)); + break; + + case XENMEM_unpin: + rc = unpin(guest_handle_cast(arg, xen_unpin_t)); break; case XENMEM_maximum_ram_page: diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index 7a26dee..eb1c2a4 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -105,6 +105,7 @@ struct xen_memory_exchange { /* * [IN] Details of memory extents to be exchanged (GMFN bases). * Note that @in.address_bits is ignored and unused. + * @out.address_bits should contain the address mask for the new pages. */ struct xen_memory_reservation in; @@ -459,6 +460,43 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); * The zero value is appropiate. */ +#define XENMEM_exchange_and_pin 26 +/* + * This hypercall is similar to XENMEM_exchange: it takes the same + * struct as an argument and it exchanges the pages passed in with a new + * set of pages. The new pages are going to be "pinned": it''s guaranteed + * that their p2m mapping won''t be changed until explicitly "unpinned". + * The content of the exchanged pages is lost. + * Only normal guest r/w memory can be pinned: no granted pages or + * ballooned pages. + * If return code is zero then @out.extent_list provides the DMA frame + * numbers of the newly-allocated memory. + * Returns zero on complete success, otherwise a negative error code: + * -ENOSYS if not implemented + * -EINVAL if the page is already pinned + * -EFAULT if the physical to machine walk fails + * On complete success then always @nr_exchanged == @in.nr_extents. On + * partial success @nr_exchanged indicates how much work was done and a + * negative error code is returned. + */ + +#define XENMEM_unpin 27 +/* + * XENMEM_unpin unpins a set of pages, previously pinned by + * XENMEM_exchange_and_pin. After this call the p2m mapping of the pages can + * be transparently changed by the hypervisor, as usual. The pages are + * still accessible from the guest. + */ +struct xen_unpin { + /* + * [IN] Details of memory extents to be unpinned (GMFN bases). + * Note that @in.address_bits is ignored and unused. + */ + struct xen_memory_reservation in; +}; +typedef struct xen_unpin xen_unpin_t; +DEFINE_XEN_GUEST_HANDLE(xen_unpin_t); + #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */ #endif /* __XEN_PUBLIC_MEMORY_H__ */ -- 1.7.2.5
Jan Beulich
2013-Aug-15 13:03 UTC
Re: [PATCH v4 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
>>> On 15.08.13 at 13:18, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > @@ -549,6 +549,15 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > { > for ( k = 0; k < (1UL << exch.out.extent_order); k++ ) > set_gpfn_from_mfn(mfn + k, gpfn + k); > + } > + if ( op == XENMEM_exchange_and_pin ) > + { > + rc = guest_physmap_pin_range(d, gpfn, exch.out.extent_order); > + if ( rc ) > + continue;I guess you decided to use "continue" here because the other paths do so too. Here, however, I don''t think that''s correct: The pinning is an essential part of the operation, and the failure may get masked by a later successful pin (because that could flush rc back to zero) - as much as any other earlier failure that resulted in rc being -EFAULT. So I''m afraid you have to go the unpleasant route here and do the proper rollback if you encounter a failure here. Or see whether the pinning can be moved even earlier.> +static long unpin(XEN_GUEST_HANDLE_PARAM(xen_unpin_t) arg) > +{ > + int rc; > + xen_ulong_t i;Wasn''t it that xen_ulong_t is 64 bits wide on ARM32? If so, this isn''t suitable as an iterator used to index arrays (or really array like constructs), and compilers might even warn about this.> + struct xen_unpin unpin; > + xen_pfn_t gpfn; > + struct domain *d; > + > + if ( copy_from_guest(&unpin, arg, 1) ) > + return -EFAULT; > + > + /* Various sanity checks. */ > + if ( /* Extent orders are sensible? */ > + (unpin.in.extent_order > MAX_ORDER) || > + /* Sizes of input list do not overflow a long? */ > + ((~0UL >> unpin.in.extent_order) < unpin.in.nr_extents) )Since you already bound check nr_extents against this unsigned long value, making i unsigned long would be just fine.> +struct xen_unpin { > + /* > + * [IN] Details of memory extents to be unpinned (GMFN bases). > + * Note that @in.address_bits is ignored and unused. > + */ > + struct xen_memory_reservation in; > +};While the [IN] above is in line with your implementation - how does a guest learn about partial success of the operation? Jan
On Thu, 2013-08-15 at 12:18 +0100, Stefano Stabellini wrote:> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>You needed to include the x86 folks here. Adding Keir and Jan. Subject needs to change, you are no longer implementing steal_page but moving it. I assume it is unchanged but please say so in the commit message.> + /* Unlink from original owner. */ > + if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) )We never call domain_adjust_tot_pages from anywhere on ARM -- I wonder if we should be ;-) Ah, the counter to this drop is in donate_page() -- which is define right above steal_page on x86 and is an ASSERT(0) on x86. I don''t think it makes sense to move one without the other. Ian.
Ian Campbell
2013-Aug-19 16:30 UTC
Re: [PATCH v4 2/4] xen/arm: introduce a generic p2m walker and use it in p2m_lookup
On Thu, 2013-08-15 at 12:18 +0100, Stefano Stabellini wrote:> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > --- > xen/arch/arm/p2m.c | 104 +++++++++++++++++++++++++++++++++++++++------------- > 1 files changed, 78 insertions(+), 26 deletions(-) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index 307c6d4..a77df9a 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -30,49 +30,101 @@ void p2m_load_VTTBR(struct domain *d) > isb(); /* Ensure update is visible */ > } > > -/* > - * Lookup the MFN corresponding to a domain''s PFN. > - * > - * There are no processor functions to do a stage 2 only lookup therefore we > - * do a a software walk. > - */Please replace with a comment describing the semantics of this function, e.g. it fails on superpages etc, order == page order (not bytes), only calls func for the leafs etc.> -paddr_t p2m_lookup(struct domain *d, paddr_t paddr) > +static int p2m_walker(struct domain *d, paddr_t paddr, unsigned int order, > + int (*func)(lpae_t *pte, void *arg), void* arg) > { > + lpae_t *first = NULL, *second = NULL, *third = NULL; > struct p2m_domain *p2m = &d->arch.p2m; > - lpae_t pte, *first = NULL, *second = NULL, *third = NULL; > - paddr_t maddr = INVALID_PADDR; > + int rc = -EFAULT, i; > + unsigned long cur_first_offset = ~0, cur_second_offset = ~0; > > spin_lock(&p2m->lock); > > first = __map_domain_page(p2m->first_level); > > - pte = first[first_table_offset(paddr)]; > - if ( !pte.p2m.valid || !pte.p2m.table ) > - goto done; > + if ( !first || > + !first[first_table_offset(paddr)].p2m.valid || > + !first[first_table_offset(paddr)].p2m.table ) > + goto err; > + > + for ( i = 0; i < (1UL << order); i++ ) > + { > + if ( cur_first_offset != first_table_offset(paddr) ) > + { > + if (second) unmap_domain_page(second); > + second = map_domain_page(first[first_table_offset(paddr)].p2m.base); > + cur_first_offset = first_table_offset(paddr); > + } > + if ( !second || > + !second[second_table_offset(paddr)].p2m.valid || > + !second[second_table_offset(paddr)].p2m.table )Can you align the ! please. Here and elsewhere.> + goto err;On the second iteration of this loop rc == 0 since the first call the func() has succeeded. So this goto doesn''t actually end up reporting an error. I think you need to set rc at the top of the loop. It''s not clear that -EFAULT is the right reaction to a super-page mapping. Perhaps func should grow a level parameter and be called for all valid levels, or at least it should be an ASSERT, since if the current semantics of the func are "no superpages" then the caller is buggy (although it would likely be made unbuggy by fixing this function). Either way a doc comment would help.> + > + if ( cur_second_offset != second_table_offset(paddr) ) > + { > + if (third) unmap_domain_page(third); > + third = map_domain_page(second[second_table_offset(paddr)].p2m.base); > + cur_second_offset = second_table_offset(paddr); > + } > + if ( !third || > + !third[third_table_offset(paddr)].p2m.valid ) > + goto err; > + > + rc = func(&third[third_table_offset(paddr)], arg); > + if ( rc != 0 ) > + return rc;Need to use err so as to unmap all the mapped pages and drop the p2m lock.> + > + paddr += PAGE_SIZE; > + } > + > + rc = 0; > + > +err: > + if ( third ) unmap_domain_page(third); > + if ( second ) unmap_domain_page(second); > + if ( first ) unmap_domain_page(first); > > - second = map_domain_page(pte.p2m.base); > - pte = second[second_table_offset(paddr)]; > - if ( !pte.p2m.valid || !pte.p2m.table ) > - goto done; > + spin_unlock(&p2m->lock); > + > + return rc; > +} > + > +struct p2m_lookup_t { > + paddr_t paddr; > + paddr_t maddr; > +}; > > - third = map_domain_page(pte.p2m.base); > - pte = third[third_table_offset(paddr)]; > +static int _p2m_lookup(lpae_t *ptep, void *arg) > +{ > + lpae_t pte; > + struct p2m_lookup_t *p2m = (struct p2m_lookup_t *)arg; > + > + pte = *ptep; > > /* This bit must be one in the level 3 entry */ > if ( !pte.p2m.table ) > pte.bits = 0;not return -EFAULT?> -done: > if ( pte.p2m.valid ) > - maddr = (pte.bits & PADDR_MASK & PAGE_MASK) | (paddr & ~PAGE_MASK); > - > - if (third) unmap_domain_page(third); > - if (second) unmap_domain_page(second); > - if (first) unmap_domain_page(first); > + p2m->maddr = (pte.bits & PADDR_MASK & PAGE_MASK) | > + (p2m->paddr & ~PAGE_MASK); > + return 0; > +} > +/* > + * Lookup the MFN corresponding to a domain''s PFN. > + * > + * There are no processor functions to do a stage 2 only lookup therefore we > + * do a a software walk. > + */ > +paddr_t p2m_lookup(struct domain *d, paddr_t paddr) > +{ > + struct p2m_lookup_t p2m; > + p2m.paddr = paddr; > + p2m.maddr = INVALID_PADDR; > > - spin_unlock(&p2m->lock); > + p2m_walker(d, paddr, 0, _p2m_lookup, &p2m); > > - return maddr; > + return p2m.maddr; > } > > int guest_physmap_mark_populate_on_demand(struct domain *d,
Ian Campbell
2013-Aug-19 16:35 UTC
Re: [PATCH v4 3/4] xen: implement guest_physmap_(un)pin_range
On Thu, 2013-08-15 at 12:18 +0100, Stefano Stabellini wrote:> guest_physmap_pin_range pins a range of guest pages so that their p2m > mappings won''t be changed. > guest_physmap_unpin_range unpins the previously pinned pages. > The pinning is done using one of the spare bits in the p2m ptes. > > Use the newly introduce p2m_walker to implement the two functions on > ARM. > Provide empty stubs for x86. > > > Changes in v4: > - use p2m_walker to implement guest_physmap_pin_range and > guest_physmap_unpin_range; > - return -EINVAL when the P2M_DMA_PIN check fails; > - change the printk into a gdprintk; > - add a comment on what type of page can be pinned. > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > --- > xen/arch/arm/p2m.c | 45 ++++++++++++++++++++++++++++++++++++++++++++ > xen/include/asm-arm/mm.h | 4 +++ > xen/include/asm-arm/page.h | 7 +++++- > xen/include/asm-x86/p2m.h | 12 +++++++++++ > 4 files changed, 67 insertions(+), 1 deletions(-) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index a77df9a..c185328 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -127,6 +127,43 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr) > return p2m.maddr; > } > > + > +static int _guest_physmap_pin_range(lpae_t *ptep, void *arg)Single _ namespace is for the compiler. __ is, I think, legal in "kernel" space but actually this should just be called pin_one_pte or something like that.> +{ > + lpae_t pte = *ptep; > + if ( pte.p2m.avail & P2M_DMA_PIN ) > + return -EINVAL; > + pte.p2m.avail |= P2M_DMA_PIN;I think test_and_set_bit would be better here and you should operate on ptep directly IMHO rather than changing a copy and writing that. Might require a pte_modified macro with the flushing stuff from write_pte.> + write_pte(ptep, pte); > + return 0; > +} > + > +int guest_physmap_pin_range(struct domain *d, > + xen_pfn_t gpfn, > + unsigned int order) > +{ > + return p2m_walker(d, gpfn << PAGE_SHIFT, order, > + _guest_physmap_pin_range, NULL); > +} > + > +static int _guest_physmap_unpin_range(lpae_t *ptep, void *arg)unpin_on_pte or something.> +{ > + lpae_t pte = *ptep; > + if ( !pte.p2m.avail & P2M_DMA_PIN ) > + return -EINVAL; > + pte.p2m.avail &= ~P2M_DMA_PIN; > + write_pte(ptep, pte); > + return 0; > +} > + > +int guest_physmap_unpin_range(struct domain *d, > + xen_pfn_t gpfn, > + unsigned int order) > +{ > + return p2m_walker(d, gpfn << PAGE_SHIFT, order, > + _guest_physmap_unpin_range, NULL); > +} > + > int guest_physmap_mark_populate_on_demand(struct domain *d, > unsigned long gfn, > unsigned int order) > @@ -236,6 +273,14 @@ static int create_p2m_entries(struct domain *d, > cur_second_offset = second_table_offset(addr); > } > > + if ( third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN )test_bit.> + { > + rc = -EINVAL; > + gdprintk(XENLOG_WARNING, "cannot change p2m mapping for paddr=%"PRIpaddr > + " domid=%d, the page is pinned\n", addr, d->domain_id); > + goto out; > + } > + > flush = third[third_table_offset(addr)].p2m.valid; > > /* Allocate a new RAM page and attach */ > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h > index df6fea9..34167e6 100644 > --- a/xen/include/asm-arm/mm.h > +++ b/xen/include/asm-arm/mm.h > @@ -317,6 +317,10 @@ void free_init_memory(void); > > int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, > unsigned int order); > +int guest_physmap_pin_range(struct domain *d, xen_pfn_t gpfn, > + unsigned int order); > +int guest_physmap_unpin_range(struct domain *d, xen_pfn_t gpfn, > + unsigned int order); > > extern void put_page_type(struct page_info *page); > static inline void put_page_and_type(struct page_info *page) > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > index 41e9eff..c08cfca 100644 > --- a/xen/include/asm-arm/page.h > +++ b/xen/include/asm-arm/page.h > @@ -153,11 +153,16 @@ typedef struct { > unsigned long hint:1; /* In a block of 16 contiguous entries */ > unsigned long sbz2:1; > unsigned long xn:1; /* eXecute-Never */ > - unsigned long avail:4; /* Ignored by hardware */ > + unsigned long avail:4; /* Ignored by hardware, see below */ > > unsigned long sbz1:5; > } __attribute__((__packed__)) lpae_p2m_t; > > +/* Xen "avail" bits allocation in third level entries */ > +#define P2M_DMA_PIN (1<<0) /* The page has been "pinned": the hypervisor > + promises not to change the p2m mapping. > + Only normal r/w guest RAM can be pinned. */ > + > /* > * Walk is the common bits of p2m and pt entries which are needed to > * simply walk the table (e.g. for debug). > diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h > index 43583b2..b08a722 100644 > --- a/xen/include/asm-x86/p2m.h > +++ b/xen/include/asm-x86/p2m.h > @@ -492,6 +492,18 @@ void guest_physmap_remove_page(struct domain *d, > /* Set a p2m range as populate-on-demand */ > int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, > unsigned int order); > +static inline int guest_physmap_pin_range(struct domain *d, > + xen_pfn_t gpfn, > + unsigned int order) > +{ > + return -ENOSYS; > +} > +static inline int guest_physmap_unpin_range(struct domain *d, > + xen_pfn_t gpfn, > + unsigned int order) > +{ > + return -ENOSYS; > +} > > /* Change types across all p2m entries in a domain */ > void p2m_change_entry_type_global(struct domain *d,
Stefano Stabellini
2013-Sep-06 16:06 UTC
Re: [PATCH v4 3/4] xen: implement guest_physmap_(un)pin_range
On Mon, 19 Aug 2013, Ian Campbell wrote:> > +{ > > + lpae_t pte = *ptep; > > + if ( pte.p2m.avail & P2M_DMA_PIN ) > > + return -EINVAL; > > + pte.p2m.avail |= P2M_DMA_PIN; > > I think test_and_set_bit would be better here and you should operate on > ptep directly IMHO rather than changing a copy and writing that. Might > require a pte_modified macro with the flushing stuff from write_pte. >Considering that ptep->p2m.avail is a bit-field and that test_and_set_bit operates on a unsigned long pointer (one cannot get a pointer to a bit-field), and considering that avail is at bit 55 > 32 (we would need to take care of incrementing the pointer too), I think that it''s not worth making this change because it would make the code far less readable. In any case these two functions are called with the p2m lock held, so the current implementation should be safe.
Ian Campbell
2013-Sep-06 16:12 UTC
Re: [PATCH v4 3/4] xen: implement guest_physmap_(un)pin_range
On Fri, 2013-09-06 at 17:06 +0100, Stefano Stabellini wrote:> On Mon, 19 Aug 2013, Ian Campbell wrote: > > > +{ > > > + lpae_t pte = *ptep; > > > + if ( pte.p2m.avail & P2M_DMA_PIN ) > > > + return -EINVAL; > > > + pte.p2m.avail |= P2M_DMA_PIN; > > > > I think test_and_set_bit would be better here and you should operate on > > ptep directly IMHO rather than changing a copy and writing that. Might > > require a pte_modified macro with the flushing stuff from write_pte. > > > > Considering that ptep->p2m.avail is a bit-field and that > test_and_set_bit operates on a unsigned long pointer (one cannot get a > pointer to a bit-field), and considering that avail is at bit 55 > 32 > (we would need to take care of incrementing the pointer too),No you don''t, test_and_set_bit can cope with bitfields bigger than wordsize. All you would need is #define P2M_DMA_PIN 54 test_and_set_bit(P2M_DMA_PIN, &ptr.bits)> I think > that it''s not worth making this change because it would make the code > far less readable. > > In any case these two functions are called with the p2m lock held, so > the current implementation should be safe.Ah, in that case you can do either as you wish. Ian
Stefano Stabellini
2013-Sep-06 17:55 UTC
Re: [PATCH v4 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
On Thu, 15 Aug 2013, Jan Beulich wrote:> >>> On 15.08.13 at 13:18, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > > @@ -549,6 +549,15 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > > { > > for ( k = 0; k < (1UL << exch.out.extent_order); k++ ) > > set_gpfn_from_mfn(mfn + k, gpfn + k); > > + } > > + if ( op == XENMEM_exchange_and_pin ) > > + { > > + rc = guest_physmap_pin_range(d, gpfn, exch.out.extent_order); > > + if ( rc ) > > + continue; > > I guess you decided to use "continue" here because the other > paths do so too. Here, however, I don''t think that''s correct: The > pinning is an essential part of the operation, and the failure may > get masked by a later successful pin (because that could flush > rc back to zero) - as much as any other earlier failure that resulted > in rc being -EFAULT. > > So I''m afraid you have to go the unpleasant route here and do > the proper rollback if you encounter a failure here. Or see whether > the pinning can be moved even earlier.I''ll add a check for the validity of the range at the beginning of the function. It looks like the safest and simplest thing to do.> > +static long unpin(XEN_GUEST_HANDLE_PARAM(xen_unpin_t) arg) > > +{ > > + int rc; > > + xen_ulong_t i; > > Wasn''t it that xen_ulong_t is 64 bits wide on ARM32? If so, this > isn''t suitable as an iterator used to index arrays (or really array > like constructs), and compilers might even warn about this. > > > + struct xen_unpin unpin; > > + xen_pfn_t gpfn; > > + struct domain *d; > > + > > + if ( copy_from_guest(&unpin, arg, 1) ) > > + return -EFAULT; > > + > > + /* Various sanity checks. */ > > + if ( /* Extent orders are sensible? */ > > + (unpin.in.extent_order > MAX_ORDER) || > > + /* Sizes of input list do not overflow a long? */ > > + ((~0UL >> unpin.in.extent_order) < unpin.in.nr_extents) ) > > Since you already bound check nr_extents against this unsigned > long value, making i unsigned long would be just fine.OK> > +struct xen_unpin { > > + /* > > + * [IN] Details of memory extents to be unpinned (GMFN bases). > > + * Note that @in.address_bits is ignored and unused. > > + */ > > + struct xen_memory_reservation in; > > +}; > > While the [IN] above is in line with your implementation - how does > a guest learn about partial success of the operation?I''ll add a nr_unpinned to the struct