Stefano Stabellini
2013-Sep-09 16:04 UTC
[PATCH v5 0/4] introduce XENMEM_exchange_and_pin and XENMEM_unpin
Hi all, this patch series introduces two new hypercalls to allow autotranslate guests to allocate a contiguous buffer in machine addresses. The XENMEM_exchange_and_pin returns the mfns and makes sure to pin the pages so that the hypervisor won''t change their p2m mappings while in use. XENMEM_unpin simply unpins the pages. Cheers, Stefano Changes in v5: - move donate_page to common code; - update commit message; - align tests; - comment p2m_walker; - fix return codes in p2m_walker; - handle superpages in p2m_walker; - rename _p2m_lookup to p2m_lookup_f; - return -EBUSY when the P2M_DMA_PIN check fails; - rename _guest_physmap_pin_range to pin_one_pte; - rename _guest_physmap_unpin_range to unpin_one_pte; - memory_exchange: handle guest_physmap_pin_range failures; - make i an unsigned long in unpinned; - add nr_unpinned to xen_unpin to report partial success. Changes in v4: - move steal_page to common code; - introduce a generic p2m walker and use it in p2m_lookup, guest_physmap_pin_range and guest_physmap_unpin_range; - return -EINVAL when the P2M_DMA_PIN check fails; - change the printk into a gdprintk; - add a comment on what type of page can be pinned; - rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin; - rename XENMEM_get_dma_buf to XENMEM_unpin; - move the pinning before we copy back the mfn to the guest; - propagate errors returned by guest_physmap_pin_range; - use xen_memory_exchange_t as parameter for XENMEM_exchange_and_pin; - use an unsigned iterator in unpin; - improve the documentation of the new hypercalls; - add a note about out.address_bits for XENMEM_exchange. Changes in v3: - implement guest_physmap_pin_range and guest_physmap_unpin_range on ARM; - implement XENMEM_put_dma_buf. Changes in v2: - actually don''t print the warning more than once. Stefano Stabellini (4): xen: move steal_page and donate_page to common code xen/arm: introduce a generic p2m walker and use it in p2m_lookup xen: implement guest_physmap_pin_range and guest_physmap_unpin_range xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin xen/arch/arm/mm.c | 12 --- xen/arch/arm/p2m.c | 177 +++++++++++++++++++++++++++++----- xen/arch/x86/mm.c | 85 ---------------- xen/common/memory.c | 226 +++++++++++++++++++++++++++++++++++++------ xen/include/asm-arm/mm.h | 9 +- xen/include/asm-arm/page.h | 7 +- xen/include/asm-x86/mm.h | 5 - xen/include/asm-x86/p2m.h | 12 +++ xen/include/public/memory.h | 47 +++++++++ xen/include/xen/mm.h | 3 + 10 files changed, 418 insertions(+), 165 deletions(-)
Stefano Stabellini
2013-Sep-09 16:06 UTC
[PATCH v5 1/4] xen: move steal_page and donate_page to common code
Only code movement, except for a small change to the warning printed in case of an error in donate_page. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: keir@xen.org CC: JBeulich@suse.com Changes in v5: - move donate_page to common code; - update commit message. Changes in v4: - move steal_page to common code. --- xen/arch/arm/mm.c | 12 ------ xen/arch/x86/mm.c | 85 ---------------------------------------------- xen/common/memory.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++ xen/include/asm-arm/mm.h | 5 --- xen/include/asm-x86/mm.h | 5 --- xen/include/xen/mm.h | 3 ++ 6 files changed, 88 insertions(+), 107 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index f301e65..89b9d27 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -742,18 +742,6 @@ void arch_dump_shared_mem_info(void) { } -int donate_page(struct domain *d, struct page_info *page, unsigned int memflags) -{ - ASSERT(0); - return -ENOSYS; -} - -int steal_page( - struct domain *d, struct page_info *page, unsigned int memflags) -{ - return -1; -} - int page_is_ram_type(unsigned long mfn, unsigned long mem_type) { ASSERT(0); diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index e7f0e13..a512046 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4124,91 +4124,6 @@ int replace_grant_host_mapping( return rc; } -int donate_page( - struct domain *d, struct page_info *page, unsigned int memflags) -{ - spin_lock(&d->page_alloc_lock); - - if ( is_xen_heap_page(page) || (page_get_owner(page) != NULL) ) - goto fail; - - if ( d->is_dying ) - goto fail; - - if ( page->count_info & ~(PGC_allocated | 1) ) - goto fail; - - if ( !(memflags & MEMF_no_refcount) ) - { - if ( d->tot_pages >= d->max_pages ) - goto fail; - domain_adjust_tot_pages(d, 1); - } - - page->count_info = PGC_allocated | 1; - page_set_owner(page, d); - page_list_add_tail(page,&d->page_list); - - spin_unlock(&d->page_alloc_lock); - return 0; - - fail: - spin_unlock(&d->page_alloc_lock); - MEM_LOG("Bad donate %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%" PRtype_info, - (void *)page_to_mfn(page), d, d->domain_id, - page_get_owner(page), page->count_info, page->u.inuse.type_info); - return -1; -} - -int steal_page( - struct domain *d, struct page_info *page, unsigned int memflags) -{ - unsigned long x, y; - bool_t drop_dom_ref = 0; - - spin_lock(&d->page_alloc_lock); - - if ( is_xen_heap_page(page) || (page_get_owner(page) != d) ) - goto fail; - - /* - * We require there is just one reference (PGC_allocated). We temporarily - * drop this reference now so that we can safely swizzle the owner. - */ - y = page->count_info; - do { - x = y; - if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) ) - goto fail; - y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask); - } while ( y != x ); - - /* Swizzle the owner then reinstate the PGC_allocated reference. */ - page_set_owner(page, NULL); - y = page->count_info; - do { - x = y; - BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated); - } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x ); - - /* Unlink from original owner. */ - if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) ) - drop_dom_ref = 1; - page_list_del(page, &d->page_list); - - spin_unlock(&d->page_alloc_lock); - if ( unlikely(drop_dom_ref) ) - put_domain(d); - return 0; - - fail: - spin_unlock(&d->page_alloc_lock); - MEM_LOG("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%" PRtype_info, - (void *)page_to_mfn(page), d, d->domain_id, - page_get_owner(page), page->count_info, page->u.inuse.type_info); - return -1; -} - static int __do_update_va_mapping( unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner) { diff --git a/xen/common/memory.c b/xen/common/memory.c index 50b740f..4b2f311 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -230,6 +230,91 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) return 1; } +int donate_page( + struct domain *d, struct page_info *page, unsigned int memflags) +{ + spin_lock(&d->page_alloc_lock); + + if ( is_xen_heap_page(page) || (page_get_owner(page) != NULL) ) + goto fail; + + if ( d->is_dying ) + goto fail; + + if ( page->count_info & ~(PGC_allocated | 1) ) + goto fail; + + if ( !(memflags & MEMF_no_refcount) ) + { + if ( d->tot_pages >= d->max_pages ) + goto fail; + domain_adjust_tot_pages(d, 1); + } + + page->count_info = PGC_allocated | 1; + page_set_owner(page, d); + page_list_add_tail(page,&d->page_list); + + spin_unlock(&d->page_alloc_lock); + return 0; + + fail: + spin_unlock(&d->page_alloc_lock); + gdprintk(XENLOG_WARNING, "Bad donate %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%016lx", + (void *)page_to_mfn(page), d, d->domain_id, + page_get_owner(page), page->count_info, page->u.inuse.type_info); + return -1; +} + +int steal_page( + struct domain *d, struct page_info *page, unsigned int memflags) +{ + unsigned long x, y; + bool_t drop_dom_ref = 0; + + spin_lock(&d->page_alloc_lock); + + if ( is_xen_heap_page(page) || (page_get_owner(page) != d) ) + goto fail; + + /* + * We require there is just one reference (PGC_allocated). We temporarily + * drop this reference now so that we can safely swizzle the owner. + */ + y = page->count_info; + do { + x = y; + if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) ) + goto fail; + y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask); + } while ( y != x ); + + /* Swizzle the owner then reinstate the PGC_allocated reference. */ + page_set_owner(page, NULL); + y = page->count_info; + do { + x = y; + BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated); + } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x ); + + /* Unlink from original owner. */ + if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) ) + drop_dom_ref = 1; + page_list_del(page, &d->page_list); + + spin_unlock(&d->page_alloc_lock); + if ( unlikely(drop_dom_ref) ) + put_domain(d); + return 0; + + fail: + spin_unlock(&d->page_alloc_lock); + printk("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%lx\n", + (void *)page_to_mfn(page), d, d->domain_id, + page_get_owner(page), page->count_info, page->u.inuse.type_info); + return -1; +} + static void decrease_reservation(struct memop_args *a) { unsigned long i, j; diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 5e7c5a3..0b0a457 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -297,11 +297,6 @@ static inline int relinquish_shared_pages(struct domain *d) /* Arch-specific portion of memory_op hypercall. */ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg); -int steal_page( - struct domain *d, struct page_info *page, unsigned int memflags); -int donate_page( - struct domain *d, struct page_info *page, unsigned int memflags); - #define domain_set_alloc_bitsize(d) ((void)0) #define domain_clamp_alloc_bitsize(d, b) (b) diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index 213fc9c..e5254ee 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -566,11 +566,6 @@ long subarch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg); int compat_arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void)); int compat_subarch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void)); -int steal_page( - struct domain *d, struct page_info *page, unsigned int memflags); -int donate_page( - struct domain *d, struct page_info *page, unsigned int memflags); - int map_ldt_shadow_page(unsigned int); #define NIL(type) ((type *)-sizeof(type)) diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h index 4f5795c..1dc859b 100644 --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -60,6 +60,9 @@ void destroy_xen_mappings(unsigned long v, unsigned long e); unsigned long domain_adjust_tot_pages(struct domain *d, long pages); int domain_set_outstanding_pages(struct domain *d, unsigned long pages); void get_outstanding_claims(uint64_t *free_pages, uint64_t *outstanding_pages); +int steal_page(struct domain *d, struct page_info *page, unsigned int memflags); +int donate_page(struct domain *d, struct page_info *page, unsigned int memflags); + /* Domain suballocator. These functions are *not* interrupt-safe.*/ void init_domheap_pages(paddr_t ps, paddr_t pe); -- 1.7.2.5
Stefano Stabellini
2013-Sep-09 16:06 UTC
[PATCH v5 2/4] xen/arm: introduce a generic p2m walker and use it in p2m_lookup
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Changes in v5: - align tests; - comment p2m_walker; - fix return codes in p2m_walker; - handle superpages in p2m_walker; - rename _p2m_lookup to p2m_lookup_f. --- xen/arch/arm/p2m.c | 131 +++++++++++++++++++++++++++++++++++++++++---------- 1 files changed, 105 insertions(+), 26 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 307c6d4..a9ceacf 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -31,48 +31,127 @@ void p2m_load_VTTBR(struct domain *d) } /* - * Lookup the MFN corresponding to a domain''s PFN. + * d: domain p2m to walk + * paddr: the guest start physical address + * order: page order + * func: function to call for each 2-stage lpae_t entry found + * arg: opaque pointer to pass to func * - * There are no processor functions to do a stage 2 only lookup therefore we - * do a a software walk. */ -paddr_t p2m_lookup(struct domain *d, paddr_t paddr) +static int p2m_walker(struct domain *d, paddr_t paddr, unsigned int order, + int (*func)(lpae_t *pte, void *arg, int level), void* arg) { + lpae_t *first = NULL, *second = NULL, *third = NULL; struct p2m_domain *p2m = &d->arch.p2m; - lpae_t pte, *first = NULL, *second = NULL, *third = NULL; - paddr_t maddr = INVALID_PADDR; + int rc = -EFAULT, level = 1; + unsigned long cur_first_offset = ~0, cur_second_offset = ~0; + paddr_t pend = paddr + (1UL << order); spin_lock(&p2m->lock); first = __map_domain_page(p2m->first_level); - pte = first[first_table_offset(paddr)]; - if ( !pte.p2m.valid || !pte.p2m.table ) - goto done; + if ( !first || + !first[first_table_offset(paddr)].p2m.valid ) + goto err; - second = map_domain_page(pte.p2m.base); - pte = second[second_table_offset(paddr)]; - if ( !pte.p2m.valid || !pte.p2m.table ) - goto done; + if ( !first[first_table_offset(paddr)].p2m.table ) + { + rc = func(&first[first_table_offset(paddr)], arg, level); + if ( rc != 0 ) + goto err; + paddr += FIRST_SIZE; + } - third = map_domain_page(pte.p2m.base); - pte = third[third_table_offset(paddr)]; + while ( paddr < pend ) + { + rc = -EFAULT; + level = 1; - /* This bit must be one in the level 3 entry */ - if ( !pte.p2m.table ) - pte.bits = 0; + if ( cur_first_offset != first_table_offset(paddr) ) + { + if (second) unmap_domain_page(second); + second = map_domain_page(first[first_table_offset(paddr)].p2m.base); + cur_first_offset = first_table_offset(paddr); + } + level++; + if ( !second || + !second[second_table_offset(paddr)].p2m.valid ) + goto err; + if ( !second[second_table_offset(paddr)].p2m.table ) + { + rc = func(&first[first_table_offset(paddr)], arg, level); + if ( rc != 0 ) + goto err; + paddr += SECOND_SIZE; + } -done: - if ( pte.p2m.valid ) - maddr = (pte.bits & PADDR_MASK & PAGE_MASK) | (paddr & ~PAGE_MASK); + if ( cur_second_offset != second_table_offset(paddr) ) + { + if (third) unmap_domain_page(third); + third = map_domain_page(second[second_table_offset(paddr)].p2m.base); + cur_second_offset = second_table_offset(paddr); + } + level++; + if ( !third || + !third[third_table_offset(paddr)].p2m.valid ) + goto err; - if (third) unmap_domain_page(third); - if (second) unmap_domain_page(second); - if (first) unmap_domain_page(first); + rc = func(&third[third_table_offset(paddr)], arg, level); + if ( rc != 0 ) + goto err; + + paddr += PAGE_SIZE; + } + + rc = 0; + +err: + if ( third ) unmap_domain_page(third); + if ( second ) unmap_domain_page(second); + if ( first ) unmap_domain_page(first); spin_unlock(&p2m->lock); - return maddr; + return rc; +} + +struct p2m_lookup_t { + paddr_t paddr; + paddr_t maddr; +}; + +static int p2m_lookup_f(lpae_t *ptep, void *arg, int level) +{ + lpae_t pte; + struct p2m_lookup_t *p2m = (struct p2m_lookup_t *)arg; + ASSERT(level == 3); + + pte = *ptep; + + /* This bit must be one in the level 3 entry */ + if ( !pte.p2m.table || !pte.p2m.valid ) + return -EFAULT; + + p2m->maddr = (pte.bits & PADDR_MASK & PAGE_MASK) | + (p2m->paddr & ~PAGE_MASK); + return 0; +} +/* + * Lookup the MFN corresponding to a domain''s PFN. + * + * There are no processor functions to do a stage 2 only lookup therefore we + * do a a software walk. + */ +paddr_t p2m_lookup(struct domain *d, paddr_t paddr) +{ + struct p2m_lookup_t p2m; + p2m.paddr = paddr; + p2m.maddr = INVALID_PADDR; + + p2m_walker(d, paddr, 0, p2m_lookup_f, &p2m); + + return p2m.maddr; } int guest_physmap_mark_populate_on_demand(struct domain *d, -- 1.7.2.5
Stefano Stabellini
2013-Sep-09 16:06 UTC
[PATCH v5 3/4] xen: implement guest_physmap_pin_range and guest_physmap_unpin_range
guest_physmap_pin_range pins a range of guest pages so that their p2m mappings won''t be changed. guest_physmap_unpin_range unpins the previously pinned pages. The pinning is done using one of the spare bits in the p2m ptes. Use the newly introduce p2m_walker to implement the two functions on ARM. Provide empty stubs for x86. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Changes in v5: - return -EBUSY when the P2M_DMA_PIN check fails; - rename _guest_physmap_pin_range to pin_one_pte; - rename _guest_physmap_unpin_range to unpin_one_pte. Changes in v4: - use p2m_walker to implement guest_physmap_pin_range and guest_physmap_unpin_range; - return -EINVAL when the P2M_DMA_PIN check fails; - change the printk into a gdprintk; - add a comment on what type of page can be pinned. --- xen/arch/arm/p2m.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ xen/include/asm-arm/mm.h | 4 +++ xen/include/asm-arm/page.h | 7 +++++- xen/include/asm-x86/p2m.h | 12 +++++++++++ 4 files changed, 70 insertions(+), 1 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index a9ceacf..bac6c7e 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -154,6 +154,46 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr) return p2m.maddr; } +static int pin_one_pte(lpae_t *ptep, void *arg, int level) +{ + lpae_t pte = *ptep; + ASSERT(level == 3); + + if ( pte.p2m.avail & P2M_DMA_PIN ) + return -EBUSY; + pte.p2m.avail |= P2M_DMA_PIN; + write_pte(ptep, pte); + return 0; +} + +int guest_physmap_pin_range(struct domain *d, + xen_pfn_t gpfn, + unsigned int order) +{ + return p2m_walker(d, gpfn << PAGE_SHIFT, order, + pin_one_pte, NULL); +} + +static int unpin_one_pte(lpae_t *ptep, void *arg, int level) +{ + lpae_t pte = *ptep; + ASSERT(level == 3); + + if ( !pte.p2m.avail & P2M_DMA_PIN ) + return -EBUSY; + pte.p2m.avail &= ~P2M_DMA_PIN; + write_pte(ptep, pte); + return 0; +} + +int guest_physmap_unpin_range(struct domain *d, + xen_pfn_t gpfn, + unsigned int order) +{ + return p2m_walker(d, gpfn << PAGE_SHIFT, order, + unpin_one_pte, NULL); +} + int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, unsigned int order) @@ -263,6 +303,14 @@ static int create_p2m_entries(struct domain *d, cur_second_offset = second_table_offset(addr); } + if ( third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN ) + { + rc = -EINVAL; + gdprintk(XENLOG_WARNING, "cannot change p2m mapping for paddr=%"PRIpaddr + " domid=%d, the page is pinned\n", addr, d->domain_id); + goto out; + } + flush = third[third_table_offset(addr)].p2m.valid; /* Allocate a new RAM page and attach */ diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 0b0a457..d6ae3ad 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -314,6 +314,10 @@ void free_init_memory(void); int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, unsigned int order); +int guest_physmap_pin_range(struct domain *d, xen_pfn_t gpfn, + unsigned int order); +int guest_physmap_unpin_range(struct domain *d, xen_pfn_t gpfn, + unsigned int order); extern void put_page_type(struct page_info *page); static inline void put_page_and_type(struct page_info *page) diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h index 41e9eff..c08cfca 100644 --- a/xen/include/asm-arm/page.h +++ b/xen/include/asm-arm/page.h @@ -153,11 +153,16 @@ typedef struct { unsigned long hint:1; /* In a block of 16 contiguous entries */ unsigned long sbz2:1; unsigned long xn:1; /* eXecute-Never */ - unsigned long avail:4; /* Ignored by hardware */ + unsigned long avail:4; /* Ignored by hardware, see below */ unsigned long sbz1:5; } __attribute__((__packed__)) lpae_p2m_t; +/* Xen "avail" bits allocation in third level entries */ +#define P2M_DMA_PIN (1<<0) /* The page has been "pinned": the hypervisor + promises not to change the p2m mapping. + Only normal r/w guest RAM can be pinned. */ + /* * Walk is the common bits of p2m and pt entries which are needed to * simply walk the table (e.g. for debug). diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 43583b2..b08a722 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -492,6 +492,18 @@ void guest_physmap_remove_page(struct domain *d, /* Set a p2m range as populate-on-demand */ int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, unsigned int order); +static inline int guest_physmap_pin_range(struct domain *d, + xen_pfn_t gpfn, + unsigned int order) +{ + return -ENOSYS; +} +static inline int guest_physmap_unpin_range(struct domain *d, + xen_pfn_t gpfn, + unsigned int order) +{ + return -ENOSYS; +} /* Change types across all p2m entries in a domain */ void p2m_change_entry_type_global(struct domain *d, -- 1.7.2.5
Stefano Stabellini
2013-Sep-09 16:06 UTC
[PATCH v5 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
Introduce two new hypercalls XENMEM_exchange_and_pin and XENMEM_unpin. XENMEM_exchange_and_pin, it''s like XENMEM_exchange but it also pins the new pages: their p2m mapping are guaranteed not to be changed, until XENMEM_unpin is called. XENMEM_exchange_and_pin returns the DMA frame numbers of the new pages to the caller, even if it''s an autotranslate guest. The only effect of XENMEM_unpin is to "unpin" the previously pinned pages. Afterwards the p2m mappings can be transparently changed by the hypervisor as normal. The memory remains accessible from the guest. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Changes in v5: - memory_exchange: handle guest_physmap_pin_range failures; - make i an unsigned long in unpinned; - add nr_unpinned to xen_unpin to report partial success. Changes in v4: - rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin; - rename XENMEM_get_dma_buf to XENMEM_unpin; - move the pinning before we copy back the mfn to the guest; - propagate errors returned by guest_physmap_pin_range; - use xen_memory_exchange_t as parameter for XENMEM_exchange_and_pin; - use an unsigned iterator in unpin; - improve the documentation of the new hypercalls; - add a note about out.address_bits for XENMEM_exchange. --- xen/common/memory.c | 141 +++++++++++++++++++++++++++++++++---------- xen/include/public/memory.h | 47 ++++++++++++++ 2 files changed, 156 insertions(+), 32 deletions(-) diff --git a/xen/common/memory.c b/xen/common/memory.c index 4b2f311..34169c1 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -364,14 +364,14 @@ static void decrease_reservation(struct memop_args *a) a->nr_done = i; } -static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) +static long memory_exchange(int op, XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) { struct xen_memory_exchange exch; PAGE_LIST_HEAD(in_chunk_list); PAGE_LIST_HEAD(out_chunk_list); unsigned long in_chunk_order, out_chunk_order; xen_pfn_t gpfn, gmfn, mfn; - unsigned long i, j, k = 0; /* gcc ... */ + unsigned long i = 0, j = 0, k = 0; /* gcc ... */ unsigned int memflags = 0; long rc = 0; struct domain *d; @@ -542,54 +542,72 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) /* Assign each output page to the domain. */ for ( j = 0; (page = page_list_remove_head(&out_chunk_list)); ++j ) { - if ( assign_pages(d, page, exch.out.extent_order, - MEMF_no_refcount) ) - { - unsigned long dec_count; - bool_t drop_dom_ref; - - /* - * Pages in in_chunk_list is stolen without - * decreasing the tot_pages. If the domain is dying when - * assign pages, we need decrease the count. For those pages - * that has been assigned, it should be covered by - * domain_relinquish_resources(). - */ - dec_count = (((1UL << exch.in.extent_order) * - (1UL << in_chunk_order)) - - (j * (1UL << exch.out.extent_order))); - - spin_lock(&d->page_alloc_lock); - domain_adjust_tot_pages(d, -dec_count); - drop_dom_ref = (dec_count && !d->tot_pages); - spin_unlock(&d->page_alloc_lock); - - if ( drop_dom_ref ) - put_domain(d); - - free_domheap_pages(page, exch.out.extent_order); - goto dying; - } + unsigned long dec_count; + bool_t drop_dom_ref; if ( __copy_from_guest_offset(&gpfn, exch.out.extent_start, (i << out_chunk_order) + j, 1) ) { rc = -EFAULT; - continue; + goto extent_error; } mfn = page_to_mfn(page); guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order); + if ( op == XENMEM_exchange_and_pin ) + { + if ( guest_physmap_pin_range(d, gpfn, exch.out.extent_order) ) + { + rc = -EFAULT; + goto extent_error_physmap; + } + } + if ( !paging_mode_translate(d) ) { for ( k = 0; k < (1UL << exch.out.extent_order); k++ ) set_gpfn_from_mfn(mfn + k, gpfn + k); + } + + rc = assign_pages(d, page, exch.out.extent_order, MEMF_no_refcount); + if ( rc ) + goto extent_error_physmap; + + if ( op == XENMEM_exchange_and_pin || !paging_mode_translate(d) ) + { if ( __copy_to_guest_offset(exch.out.extent_start, (i << out_chunk_order) + j, &mfn, 1) ) rc = -EFAULT; } + + continue; + +extent_error_physmap: + guest_physmap_remove_page(d, gpfn, mfn, exch.out.extent_order); +extent_error: + /* + * Pages in in_chunk_list is stolen without + * decreasing the tot_pages. If the domain is dying when + * assign pages, we need decrease the count. For those pages + * that has been assigned, it should be covered by + * domain_relinquish_resources(). + */ + dec_count = (((1UL << exch.in.extent_order) * + (1UL << in_chunk_order)) - + (j * (1UL << exch.out.extent_order))); + + spin_lock(&d->page_alloc_lock); + domain_adjust_tot_pages(d, -dec_count); + drop_dom_ref = (dec_count && !d->tot_pages); + spin_unlock(&d->page_alloc_lock); + + if ( drop_dom_ref ) + put_domain(d); + + free_domheap_pages(page, exch.out.extent_order); + goto dying; } BUG_ON( !(d->is_dying) && (j != (1UL << out_chunk_order)) ); } @@ -627,6 +645,60 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) return rc; } +static long unpin(XEN_GUEST_HANDLE_PARAM(xen_unpin_t) arg) +{ + int rc; + unsigned long i; + struct xen_unpin unpin; + xen_pfn_t gpfn; + struct domain *d; + + if ( copy_from_guest(&unpin, arg, 1) ) + return -EFAULT; + + /* Various sanity checks. */ + if ( /* Extent orders are sensible? */ + (unpin.in.extent_order > MAX_ORDER) || + /* Sizes of input list do not overflow a long? */ + ((~0UL >> unpin.in.extent_order) < unpin.in.nr_extents) ) + return -EFAULT; + + if ( !guest_handle_okay(unpin.in.extent_start, unpin.in.nr_extents) ) + return -EFAULT; + + d = rcu_lock_domain_by_any_id(unpin.in.domid); + if ( d == NULL ) + { + rc = -ESRCH; + goto fail; + } + + for ( i = 0; i < unpin.in.nr_extents; i++ ) + { + if ( unlikely(__copy_from_guest_offset( + &gpfn, unpin.in.extent_start, i, 1)) ) + { + rc = -EFAULT; + goto partial; + } + + rc = guest_physmap_unpin_range(d, gpfn, unpin.in.extent_order); + if ( rc ) + goto partial; + } + + rc = 0; + +partial: + unpin.nr_unpinned = i; + if ( __copy_field_to_guest(arg, &unpin, nr_unpinned) ) + rc = -EFAULT; + + fail: + rcu_unlock_domain(d); + return rc; +} + long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { struct domain *d; @@ -715,8 +787,13 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; + case XENMEM_exchange_and_pin: case XENMEM_exchange: - rc = memory_exchange(guest_handle_cast(arg, xen_memory_exchange_t)); + rc = memory_exchange(op, guest_handle_cast(arg, xen_memory_exchange_t)); + break; + + case XENMEM_unpin: + rc = unpin(guest_handle_cast(arg, xen_unpin_t)); break; case XENMEM_maximum_ram_page: diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index 7a26dee..73fdc31 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -105,6 +105,7 @@ struct xen_memory_exchange { /* * [IN] Details of memory extents to be exchanged (GMFN bases). * Note that @in.address_bits is ignored and unused. + * @out.address_bits should contain the address mask for the new pages. */ struct xen_memory_reservation in; @@ -459,6 +460,52 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); * The zero value is appropiate. */ +#define XENMEM_exchange_and_pin 26 +/* + * This hypercall is similar to XENMEM_exchange: it takes the same + * struct as an argument and it exchanges the pages passed in with a new + * set of pages. The new pages are going to be "pinned": it''s guaranteed + * that their p2m mapping won''t be changed until explicitly "unpinned". + * The content of the exchanged pages is lost. + * Only normal guest r/w memory can be pinned: no granted pages or + * ballooned pages. + * If return code is zero then @out.extent_list provides the DMA frame + * numbers of the newly-allocated memory. + * Returns zero on complete success, otherwise a negative error code: + * -ENOSYS if not implemented + * -EINVAL if the page is already pinned + * -EFAULT if an internal error occurs + * On complete success then always @nr_exchanged == @in.nr_extents. On + * partial success @nr_exchanged indicates how much work was done and a + * negative error code is returned. + */ + +#define XENMEM_unpin 27 +/* + * XENMEM_unpin unpins a set of pages, previously pinned by + * XENMEM_exchange_and_pin. After this call the p2m mapping of the pages can + * be transparently changed by the hypervisor, as usual. The pages are + * still accessible from the guest. + */ +struct xen_unpin { + /* + * [IN] Details of memory extents to be unpinned (GMFN bases). + * Note that @in.address_bits is ignored and unused. + */ + struct xen_memory_reservation in; + /* + * [OUT] Number of input extents that were successfully unpinned. + * 1. The first @nr_unpinned input extents were successfully + * unpinned. + * 2. All other input extents are untouched. + * 3. If not all input exents are unpinned then the return code of this + * command will be non-zero. + */ + xen_ulong_t nr_unpinned; +}; +typedef struct xen_unpin xen_unpin_t; +DEFINE_XEN_GUEST_HANDLE(xen_unpin_t); + #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */ #endif /* __XEN_PUBLIC_MEMORY_H__ */ -- 1.7.2.5
Ian Campbell
2013-Sep-10 09:08 UTC
Re: [PATCH v5 1/4] xen: move steal_page and donate_page to common code
On Mon, 2013-09-09 at 17:06 +0100, Stefano Stabellini wrote:> Only code movement, except for a small change to the warning printed in > case of an error in donate_page. > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > CC: keir@xen.org > CC: JBeulich@suse.comNot much to it on the ARM side but: Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell
2013-Sep-10 09:26 UTC
Re: [PATCH v5 2/4] xen/arm: introduce a generic p2m walker and use it in p2m_lookup
On Mon, 2013-09-09 at 17:06 +0100, Stefano Stabellini wrote:> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > > Changes in v5: > - align tests; > - comment p2m_walker; > - fix return codes in p2m_walker; > - handle superpages in p2m_walker; > - rename _p2m_lookup to p2m_lookup_f. > --- > xen/arch/arm/p2m.c | 131 +++++++++++++++++++++++++++++++++++++++++---------- > 1 files changed, 105 insertions(+), 26 deletions(-) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index 307c6d4..a9ceacf 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -31,48 +31,127 @@ void p2m_load_VTTBR(struct domain *d) > } > > /* > - * Lookup the MFN corresponding to a domain''s PFN. > + * d: domain p2m to walk > + * paddr: the guest start physical address > + * order: page order > + * func: function to call for each 2-stage lpae_t entry founds/2-stage/stage-2/ Also it is called for each *leaf* entry, not for all of them. It could be argued that either was a useful interface in general. This function is not actually all the p2m specific in the end, by using lpae_t.walk instead of lpae_t.pt (see dump_pt_walk) you could make it useful for e.g. hyp pagetable walks too. (e.g. dump_hyp_walk). In theory it could also be used for LPAE guest walks too, but we''d need a separate walker for non-LPAE guests. If we wanted to replace dump_hyp_walk with this then calling the callback for each level would be required.> + * arg: opaque pointer to pass to func > * > - * There are no processor functions to do a stage 2 only lookup therefore we > - * do a a software walk. > */ > -paddr_t p2m_lookup(struct domain *d, paddr_t paddr) > +static int p2m_walker(struct domain *d, paddr_t paddr, unsigned int order, > + int (*func)(lpae_t *pte, void *arg, int level), void* arg) > { > + lpae_t *first = NULL, *second = NULL, *third = NULL; > struct p2m_domain *p2m = &d->arch.p2m; > - lpae_t pte, *first = NULL, *second = NULL, *third = NULL; > - paddr_t maddr = INVALID_PADDR; > + int rc = -EFAULT, level = 1; > + unsigned long cur_first_offset = ~0, cur_second_offset = ~0; > + paddr_t pend = paddr + (1UL << order);1UL is 32 bits on arm32, but paddr_t is 64 bits, so this might fail for large enough order? I think you need something like: (((paddr_t)1UL)<<order). Or maybe a helpful macro.> > spin_lock(&p2m->lock); > > first = __map_domain_page(p2m->first_level); > > - pte = first[first_table_offset(paddr)]; > - if ( !pte.p2m.valid || !pte.p2m.table ) > - goto done; > + if ( !first || > + !first[first_table_offset(paddr)].p2m.valid ) > + goto err;Why is the first level handled specially outside the loop? What happens if order is such that we span multiple first level entries?> - second = map_domain_page(pte.p2m.base); > - pte = second[second_table_offset(paddr)]; > - if ( !pte.p2m.valid || !pte.p2m.table ) > - goto done; > + if ( !first[first_table_offset(paddr)].p2m.table ) > + { > + rc = func(&first[first_table_offset(paddr)], arg, level); > + if ( rc != 0 ) > + goto err; > + paddr += FIRST_SIZE; > + } > > - third = map_domain_page(pte.p2m.base); > - pte = third[third_table_offset(paddr)]; > + while ( paddr < pend ) > + { > + rc = -EFAULT; > + level = 1;I think this and all the level++ stuff could be replaced by literal 1, 2, and 3 at the appropriate callback sites. The level is statically implied by the code.> > - /* This bit must be one in the level 3 entry */ > - if ( !pte.p2m.table ) > - pte.bits = 0; > + if ( cur_first_offset != first_table_offset(paddr) ) > + { > + if (second) unmap_domain_page(second); > + second = map_domain_page(first[first_table_offset(paddr)].p2m.base); > + cur_first_offset = first_table_offset(paddr); > + } > + level++; > + if ( !second ||ASSERT(second) seems reasonable here, I think, since it would indicate we had screwed up the p2m pretty badly. That would...> + !second[second_table_offset(paddr)].p2m.valid ) > + goto err;... leave the -EFAULT resulting from this for a failure of the actual walk itself.> + if ( !second[second_table_offset(paddr)].p2m.table ) > + { > + rc = func(&first[first_table_offset(paddr)], arg, level); > + if ( rc != 0 ) > + goto err; > + paddr += SECOND_SIZE;You need to continue here, don''t you? Otherwise you will look into the 3rd level for the paddr +SECOND_SIZE address before you should have done. Likewise for the 1st level stuff once it is properly within the loop (I wonder if this is what caused you to pull that out?)> + } > > -done: > - if ( pte.p2m.valid ) > - maddr = (pte.bits & PADDR_MASK & PAGE_MASK) | (paddr & ~PAGE_MASK); > + if ( cur_second_offset != second_table_offset(paddr) ) > + { > + if (third) unmap_domain_page(third); > + third = map_domain_page(second[second_table_offset(paddr)].p2m.base); > + cur_second_offset = second_table_offset(paddr); > + } > + level++; > + if ( !third || > + !third[third_table_offset(paddr)].p2m.valid ) > + goto err;If third...table is 0 then that is an error too, iff valid == 1.> > - if (third) unmap_domain_page(third); > - if (second) unmap_domain_page(second); > - if (first) unmap_domain_page(first); > + rc = func(&third[third_table_offset(paddr)], arg, level); > + if ( rc != 0 ) > + goto err; > + > + paddr += PAGE_SIZE; > + } > + > + rc = 0; > + > +err: > + if ( third ) unmap_domain_page(third); > + if ( second ) unmap_domain_page(second); > + if ( first ) unmap_domain_page(first); > > spin_unlock(&p2m->lock); > > - return maddr; > + return rc; > +} > + > +struct p2m_lookup_t { > + paddr_t paddr; > + paddr_t maddr; > +}; > + > +static int p2m_lookup_f(lpae_t *ptep, void *arg, int level) > +{ > + lpae_t pte; > + struct p2m_lookup_t *p2m = (struct p2m_lookup_t *)arg; > + ASSERT(level == 3); > + > + pte = *ptep; > + > + /* This bit must be one in the level 3 entry */ > + if ( !pte.p2m.table || !pte.p2m.valid ) > + return -EFAULT;Ah, here''s that check I was talking about early -- you could make this generic I think.> + > + p2m->maddr = (pte.bits & PADDR_MASK & PAGE_MASK) | > + (p2m->paddr & ~PAGE_MASK); > + return 0; > +} > +/* > + * Lookup the MFN corresponding to a domain''s PFN. > + * > + * There are no processor functions to do a stage 2 only lookup therefore we > + * do a a software walk. > + */ > +paddr_t p2m_lookup(struct domain *d, paddr_t paddr) > +{ > + struct p2m_lookup_t p2m; > + p2m.paddr = paddr; > + p2m.maddr = INVALID_PADDR; > + > + p2m_walker(d, paddr, 0, p2m_lookup_f, &p2m); > + > + return p2m.maddr; > } > > int guest_physmap_mark_populate_on_demand(struct domain *d,
Ian Campbell
2013-Sep-10 09:30 UTC
Re: [PATCH v5 3/4] xen: implement guest_physmap_pin_range and guest_physmap_unpin_range
On Mon, 2013-09-09 at 17:06 +0100, Stefano Stabellini wrote:> guest_physmap_pin_range pins a range of guest pages so that their p2m > mappings won''t be changed. > guest_physmap_unpin_range unpins the previously pinned pages. > The pinning is done using one of the spare bits in the p2m ptes. > > Use the newly introduce p2m_walker to implement the two functions on > ARM. > Provide empty stubs for x86. > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > > Changes in v5: > - return -EBUSY when the P2M_DMA_PIN check fails; > - rename _guest_physmap_pin_range to pin_one_pte; > - rename _guest_physmap_unpin_range to unpin_one_pte. > > Changes in v4: > - use p2m_walker to implement guest_physmap_pin_range and > guest_physmap_unpin_range; > - return -EINVAL when the P2M_DMA_PIN check fails; > - change the printk into a gdprintk; > - add a comment on what type of page can be pinned. > --- > xen/arch/arm/p2m.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ > xen/include/asm-arm/mm.h | 4 +++ > xen/include/asm-arm/page.h | 7 +++++- > xen/include/asm-x86/p2m.h | 12 +++++++++++ > 4 files changed, 70 insertions(+), 1 deletions(-) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index a9ceacf..bac6c7e 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -154,6 +154,46 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr) > return p2m.maddr; > } > > +static int pin_one_pte(lpae_t *ptep, void *arg, int level) > +{ > + lpae_t pte = *ptep; > + ASSERT(level == 3); > + > + if ( pte.p2m.avail & P2M_DMA_PIN ) > + return -EBUSY; > + pte.p2m.avail |= P2M_DMA_PIN; > + write_pte(ptep, pte); > + return 0; > +} > + > +int guest_physmap_pin_range(struct domain *d, > + xen_pfn_t gpfn, > + unsigned int order) > +{ > + return p2m_walker(d, gpfn << PAGE_SHIFT, order, > + pin_one_pte, NULL);If this fails then you will have left some subset of the pages successfully pinned. You need to clean it up I think, or be fatal to the guest or something.> +} > + > +static int unpin_one_pte(lpae_t *ptep, void *arg, int level) > +{ > + lpae_t pte = *ptep; > + ASSERT(level == 3); > + > + if ( !pte.p2m.avail & P2M_DMA_PIN ) > + return -EBUSY;The error here is that it isn''t busy. EINVAL perhaps?> + pte.p2m.avail &= ~P2M_DMA_PIN; > + write_pte(ptep, pte); > + return 0; > +} > + > +int guest_physmap_unpin_range(struct domain *d, > + xen_pfn_t gpfn, > + unsigned int order) > +{ > + return p2m_walker(d, gpfn << PAGE_SHIFT, order, > + unpin_one_pte, NULL); > +} > + > int guest_physmap_mark_populate_on_demand(struct domain *d, > unsigned long gfn, > unsigned int order) > @@ -263,6 +303,14 @@ static int create_p2m_entries(struct domain *d, > cur_second_offset = second_table_offset(addr); > } > > + if ( third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN ) > + { > + rc = -EINVAL; > + gdprintk(XENLOG_WARNING, "cannot change p2m mapping for paddr=%"PRIpaddr > + " domid=%d, the page is pinned\n", addr, d->domain_id); > + goto out; > + } > + > flush = third[third_table_offset(addr)].p2m.valid; > > /* Allocate a new RAM page and attach */ > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > index 41e9eff..c08cfca 100644 > --- a/xen/include/asm-arm/page.h > +++ b/xen/include/asm-arm/page.h > @@ -153,11 +153,16 @@ typedef struct { > unsigned long hint:1; /* In a block of 16 contiguous entries */ > unsigned long sbz2:1; > unsigned long xn:1; /* eXecute-Never */ > - unsigned long avail:4; /* Ignored by hardware */ > + unsigned long avail:4; /* Ignored by hardware, see below */ > > unsigned long sbz1:5; > } __attribute__((__packed__)) lpae_p2m_t; > > +/* Xen "avail" bits allocation in third level entries */ > +#define P2M_DMA_PIN (1<<0) /* The page has been "pinned": the hypervisor > + promises not to change the p2m mapping. > + Only normal r/w guest RAM can be pinned. */Where is that last requirement enforced?> + > /* > * Walk is the common bits of p2m and pt entries which are needed to > * simply walk the table (e.g. for debug).
Ian Campbell
2013-Sep-10 09:33 UTC
Re: [PATCH v5 3/4] xen: implement guest_physmap_pin_range and guest_physmap_unpin_range
On Mon, 2013-09-09 at 17:06 +0100, Stefano Stabellini wrote:> > +static int pin_one_pte(lpae_t *ptep, void *arg, int level) > +{ > + lpae_t pte = *ptep; > + ASSERT(level == 3); > + > + if ( pte.p2m.avail & P2M_DMA_PIN ) > + return -EBUSY; > + pte.p2m.avail |= P2M_DMA_PIN; > + write_pte(ptep, pte); > + return 0; > +} > + > +int guest_physmap_pin_range(struct domain *d, > + xen_pfn_t gpfn, > + unsigned int order) > +{ > + return p2m_walker(d, gpfn << PAGE_SHIFT, order, > + pin_one_pte, NULL);Did we not also discuss accounting and limits on the amount of memory a guest can lock down?
Ian Campbell
2013-Sep-10 09:47 UTC
Re: [PATCH v5 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
On Mon, 2013-09-09 at 17:06 +0100, Stefano Stabellini wrote:> Introduce two new hypercalls XENMEM_exchange_and_pin and XENMEM_unpin. > > XENMEM_exchange_and_pin, it''s like XENMEM_exchange but it also pins the > new pages: their p2m mapping are guaranteed not to be changed, until > XENMEM_unpin is called. XENMEM_exchange_and_pin returns the DMA frame > numbers of the new pages to the caller, even if it''s an autotranslate > guest. > > The only effect of XENMEM_unpin is to "unpin" the previously > pinned pages. Afterwards the p2m mappings can be transparently changed by > the hypervisor as normal. The memory remains accessible from the guest. > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>This needs input from the x86 and common code maintainers, who you''ve not ccd. I''ve added them.> > > Changes in v5: > - memory_exchange: handle guest_physmap_pin_range failures; > - make i an unsigned long in unpinned; > - add nr_unpinned to xen_unpin to report partial success. > > Changes in v4: > - rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin; > - rename XENMEM_get_dma_buf to XENMEM_unpin; > - move the pinning before we copy back the mfn to the guest; > - propagate errors returned by guest_physmap_pin_range; > - use xen_memory_exchange_t as parameter for XENMEM_exchange_and_pin; > - use an unsigned iterator in unpin; > - improve the documentation of the new hypercalls; > - add a note about out.address_bits for XENMEM_exchange. > --- > xen/common/memory.c | 141 +++++++++++++++++++++++++++++++++---------- > xen/include/public/memory.h | 47 ++++++++++++++ > 2 files changed, 156 insertions(+), 32 deletions(-) > > diff --git a/xen/common/memory.c b/xen/common/memory.c > index 4b2f311..34169c1 100644 > --- a/xen/common/memory.c > +++ b/xen/common/memory.c > @@ -364,14 +364,14 @@ static void decrease_reservation(struct memop_args *a) > a->nr_done = i; > } > > -static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > +static long memory_exchange(int op, XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > { > struct xen_memory_exchange exch; > PAGE_LIST_HEAD(in_chunk_list); > PAGE_LIST_HEAD(out_chunk_list); > unsigned long in_chunk_order, out_chunk_order; > xen_pfn_t gpfn, gmfn, mfn; > - unsigned long i, j, k = 0; /* gcc ... */ > + unsigned long i = 0, j = 0, k = 0; /* gcc ... */ > unsigned int memflags = 0; > long rc = 0; > struct domain *d; > @@ -542,54 +542,72 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > /* Assign each output page to the domain. */ > for ( j = 0; (page = page_list_remove_head(&out_chunk_list)); ++j ) > { > - if ( assign_pages(d, page, exch.out.extent_order, > - MEMF_no_refcount) ) > - { > - unsigned long dec_count; > - bool_t drop_dom_ref; > - > - /* > - * Pages in in_chunk_list is stolen without > - * decreasing the tot_pages. If the domain is dying when > - * assign pages, we need decrease the count. For those pages > - * that has been assigned, it should be covered by > - * domain_relinquish_resources(). > - */ > - dec_count = (((1UL << exch.in.extent_order) * > - (1UL << in_chunk_order)) - > - (j * (1UL << exch.out.extent_order))); > - > - spin_lock(&d->page_alloc_lock); > - domain_adjust_tot_pages(d, -dec_count); > - drop_dom_ref = (dec_count && !d->tot_pages); > - spin_unlock(&d->page_alloc_lock); > - > - if ( drop_dom_ref ) > - put_domain(d); > - > - free_domheap_pages(page, exch.out.extent_order); > - goto dying; > - } > + unsigned long dec_count; > + bool_t drop_dom_ref; > > if ( __copy_from_guest_offset(&gpfn, exch.out.extent_start, > (i << out_chunk_order) + j, 1) ) > { > rc = -EFAULT; > - continue; > + goto extent_error; > } > > mfn = page_to_mfn(page); > guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order); > > + if ( op == XENMEM_exchange_and_pin ) > + { > + if ( guest_physmap_pin_range(d, gpfn, exch.out.extent_order) ) > + { > + rc = -EFAULT; > + goto extent_error_physmap; > + } > + } > + > if ( !paging_mode_translate(d) ) > { > for ( k = 0; k < (1UL << exch.out.extent_order); k++ ) > set_gpfn_from_mfn(mfn + k, gpfn + k); > + } > + > + rc = assign_pages(d, page, exch.out.extent_order, MEMF_no_refcount);This has moved after set_gpfn_from_mfn in the exchange case? Why? Aren''t we now pinning and setting the gpfn for pages which don''t yet belong to the guest? I think any refactoring of error paths etc would be best done as a separate patch to adding the new hypercall.> + if ( rc ) > + goto extent_error_physmap; > + > + if ( op == XENMEM_exchange_and_pin || !paging_mode_translate(d) ) > + { > if ( __copy_to_guest_offset(exch.out.extent_start, > (i << out_chunk_order) + j, > &mfn, 1) ) > rc = -EFAULT;After this we continue rather than errorring?> } > + > + continue; > + > +extent_error_physmap: > + guest_physmap_remove_page(d, gpfn, mfn, exch.out.extent_order); > +extent_error: > + /* > + * Pages in in_chunk_list is stolen without > + * decreasing the tot_pages. If the domain is dying when > + * assign pages, we need decrease the count. For those pages > + * that has been assigned, it should be covered by > + * domain_relinquish_resources(). > + */ > + dec_count = (((1UL << exch.in.extent_order) * > + (1UL << in_chunk_order)) - > + (j * (1UL << exch.out.extent_order))); > + > + spin_lock(&d->page_alloc_lock); > + domain_adjust_tot_pages(d, -dec_count); > + drop_dom_ref = (dec_count && !d->tot_pages); > + spin_unlock(&d->page_alloc_lock); > + > + if ( drop_dom_ref ) > + put_domain(d); > + > + free_domheap_pages(page, exch.out.extent_order); > + goto dying; > } > BUG_ON( !(d->is_dying) && (j != (1UL << out_chunk_order)) ); > } > @@ -627,6 +645,60 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > return rc; > } > > +static long unpin(XEN_GUEST_HANDLE_PARAM(xen_unpin_t) arg) > +{ > + int rc; > + unsigned long i; > + struct xen_unpin unpin; > + xen_pfn_t gpfn; > + struct domain *d; > + > + if ( copy_from_guest(&unpin, arg, 1) ) > + return -EFAULT; > + > + /* Various sanity checks. */ > + if ( /* Extent orders are sensible? */ > + (unpin.in.extent_order > MAX_ORDER) || > + /* Sizes of input list do not overflow a long? */ > + ((~0UL >> unpin.in.extent_order) < unpin.in.nr_extents) ) > + return -EFAULT; > + > + if ( !guest_handle_okay(unpin.in.extent_start, unpin.in.nr_extents) ) > + return -EFAULT; > + > + d = rcu_lock_domain_by_any_id(unpin.in.domid); > + if ( d == NULL ) > + { > + rc = -ESRCH; > + goto fail; > + } > + > + for ( i = 0; i < unpin.in.nr_extents; i++ ) > + { > + if ( unlikely(__copy_from_guest_offset( > + &gpfn, unpin.in.extent_start, i, 1)) ) > + { > + rc = -EFAULT; > + goto partial; > + } > + > + rc = guest_physmap_unpin_range(d, gpfn, unpin.in.extent_order); > + if ( rc ) > + goto partial; > + } > + > + rc = 0; > + > +partial: > + unpin.nr_unpinned = i; > + if ( __copy_field_to_guest(arg, &unpin, nr_unpinned) ) > + rc = -EFAULT; > + > + fail: > + rcu_unlock_domain(d); > + return rc; > +} > + > long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) > { > struct domain *d; > @@ -715,8 +787,13 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) > > break; > > + case XENMEM_exchange_and_pin: > case XENMEM_exchange: > - rc = memory_exchange(guest_handle_cast(arg, xen_memory_exchange_t)); > + rc = memory_exchange(op, guest_handle_cast(arg, xen_memory_exchange_t)); > + break; > + > + case XENMEM_unpin: > + rc = unpin(guest_handle_cast(arg, xen_unpin_t)); > break; > > case XENMEM_maximum_ram_page: > diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h > index 7a26dee..73fdc31 100644 > --- a/xen/include/public/memory.h > +++ b/xen/include/public/memory.h > @@ -105,6 +105,7 @@ struct xen_memory_exchange { > /* > * [IN] Details of memory extents to be exchanged (GMFN bases). > * Note that @in.address_bits is ignored and unused. > + * @out.address_bits should contain the address mask for the new pages. > */ > struct xen_memory_reservation in; > > @@ -459,6 +460,52 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); > * The zero value is appropiate. > */ > > +#define XENMEM_exchange_and_pin 26All the other doc comments in this file precede their #define.> +/* > + * This hypercall is similar to XENMEM_exchange: it takes the same > + * struct as an argument and it exchanges the pages passed in with a new > + * set of pages. The new pages are going to be "pinned": it''s guaranteed > + * that their p2m mapping won''t be changed until explicitly "unpinned". > + * The content of the exchanged pages is lost. > + * Only normal guest r/w memory can be pinned: no granted pages or > + * ballooned pages. > + * If return code is zero then @out.extent_list provides the DMA frame > + * numbers of the newly-allocated memory. > + * Returns zero on complete success, otherwise a negative error code: > + * -ENOSYS if not implemented > + * -EINVAL if the page is already pinnedNot -EBUSY?> + * -EFAULT if an internal error occurs> + * On complete success then always @nr_exchanged == @in.nr_extents. On > + * partial success @nr_exchanged indicates how much work was done and a > + * negative error code is returned. > + */ > + > +#define XENMEM_unpin 27 > +/* > + * XENMEM_unpin unpins a set of pages, previously pinned by > + * XENMEM_exchange_and_pin. After this call the p2m mapping of the pages can > + * be transparently changed by the hypervisor, as usual. The pages are > + * still accessible from the guest. > + */ > +struct xen_unpin { > + /* > + * [IN] Details of memory extents to be unpinned (GMFN bases). > + * Note that @in.address_bits is ignored and unused. > + */ > + struct xen_memory_reservation in; > + /* > + * [OUT] Number of input extents that were successfully unpinned. > + * 1. The first @nr_unpinned input extents were successfully > + * unpinned.Except if the code fails before doing any pinning (e.g. the domain lookup fails), then it is arbitrarily whatever it was set to by the guest or uninitialised (from the fail label in unpin()). There are error codes (e..g EFAULT) which are common to both cases so they are indistinguishable.> + * 2. All other input extents are untouched. > + * 3. If not all input exents are unpinned then the return code of this"extents"> + * command will be non-zero. > + */ > + xen_ulong_t nr_unpinned; > +}; > +typedef struct xen_unpin xen_unpin_t; > +DEFINE_XEN_GUEST_HANDLE(xen_unpin_t); > + > #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */ > > #endif /* __XEN_PUBLIC_MEMORY_H__ */
Jan Beulich
2013-Sep-10 10:39 UTC
Re: [PATCH v5 1/4] xen: move steal_page and donate_page to common code
>>> On 09.09.13 at 18:06, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > --- a/xen/common/memory.c > +++ b/xen/common/memory.c > @@ -230,6 +230,91 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) > return 1; > } > > +int donate_page( > + struct domain *d, struct page_info *page, unsigned int memflags) > +{ > + spin_lock(&d->page_alloc_lock); > + > + if ( is_xen_heap_page(page) || (page_get_owner(page) != NULL) ) > + goto fail; > + > + if ( d->is_dying ) > + goto fail; > + > + if ( page->count_info & ~(PGC_allocated | 1) )While the literal 1 was acceptable in arch specific code, I''m not sure it really is in generic code. Nothing enforces that an arch has the count starting at bit 0. Even if it reads a little odd, using (-PGC_count_mask & PGC_count_mask) would probably be an acceptable replacement here and further down.> + goto fail; > + > + if ( !(memflags & MEMF_no_refcount) ) > + { > + if ( d->tot_pages >= d->max_pages ) > + goto fail; > + domain_adjust_tot_pages(d, 1); > + } > + > + page->count_info = PGC_allocated | 1; > + page_set_owner(page, d); > + page_list_add_tail(page,&d->page_list);Please use the opportunity and insert a blank after the comma, the more that you''re not doing a literal copy of the function anyway.> + > + spin_unlock(&d->page_alloc_lock); > + return 0; > + > + fail: > + spin_unlock(&d->page_alloc_lock); > + gdprintk(XENLOG_WARNING, "Bad donate %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%016lx", > + (void *)page_to_mfn(page), d, d->domain_id, > + page_get_owner(page), page->count_info, page->u.inuse.type_info);One of the (unfortunately many) cases where gdprintk() is wrong: You don''t care about the current domain here. Also casting page_to_mfn() to void * in common code seems fragile. I wonder why this cast was there in the original code: Using %lx should be quite fine here.> + return -1; > +} > + > +int steal_page( > + struct domain *d, struct page_info *page, unsigned int memflags) > +{ > + unsigned long x, y; > + bool_t drop_dom_ref = 0; > + > + spin_lock(&d->page_alloc_lock); > + > + if ( is_xen_heap_page(page) || (page_get_owner(page) != d) ) > + goto fail; > + > + /* > + * We require there is just one reference (PGC_allocated). We temporarily > + * drop this reference now so that we can safely swizzle the owner. > + */ > + y = page->count_info; > + do { > + x = y; > + if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) ) > + goto fail; > + y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask); > + } while ( y != x ); > + > + /* Swizzle the owner then reinstate the PGC_allocated reference. */ > + page_set_owner(page, NULL); > + y = page->count_info; > + do { > + x = y; > + BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated); > + } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x ); > + > + /* Unlink from original owner. */ > + if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) ) > + drop_dom_ref = 1; > + page_list_del(page, &d->page_list); > + > + spin_unlock(&d->page_alloc_lock); > + if ( unlikely(drop_dom_ref) ) > + put_domain(d); > + return 0; > + > + fail: > + spin_unlock(&d->page_alloc_lock); > + printk("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%lx\n", > + (void *)page_to_mfn(page), d, d->domain_id, > + page_get_owner(page), page->count_info, page->u.inuse.type_info);This clearly lacks a XENLOG_G_<something>. It would also be desirable to have the word "steal" in the message somewhere, to make it less generic. Jan
Jan Beulich
2013-Sep-10 12:15 UTC
Re: [PATCH v5 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
>>> On 09.09.13 at 18:06, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > +static long memory_exchange(int op, XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > { > struct xen_memory_exchange exch; > PAGE_LIST_HEAD(in_chunk_list); > PAGE_LIST_HEAD(out_chunk_list); > unsigned long in_chunk_order, out_chunk_order; > xen_pfn_t gpfn, gmfn, mfn; > - unsigned long i, j, k = 0; /* gcc ... */ > + unsigned long i = 0, j = 0, k = 0; /* gcc ... */This is unnecessary afaict.> @@ -542,54 +542,72 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > /* Assign each output page to the domain. */ > for ( j = 0; (page = page_list_remove_head(&out_chunk_list)); ++j ) > { > - if ( assign_pages(d, page, exch.out.extent_order, > - MEMF_no_refcount) ) > - { > - unsigned long dec_count; > - bool_t drop_dom_ref; > - > - /* > - * Pages in in_chunk_list is stolen without > - * decreasing the tot_pages. If the domain is dying when > - * assign pages, we need decrease the count. For those pages > - * that has been assigned, it should be covered by > - * domain_relinquish_resources(). > - */ > - dec_count = (((1UL << exch.in.extent_order) * > - (1UL << in_chunk_order)) - > - (j * (1UL << exch.out.extent_order))); > - > - spin_lock(&d->page_alloc_lock); > - domain_adjust_tot_pages(d, -dec_count); > - drop_dom_ref = (dec_count && !d->tot_pages); > - spin_unlock(&d->page_alloc_lock); > - > - if ( drop_dom_ref ) > - put_domain(d); > - > - free_domheap_pages(page, exch.out.extent_order); > - goto dying; > - } > + unsigned long dec_count; > + bool_t drop_dom_ref; > > if ( __copy_from_guest_offset(&gpfn, exch.out.extent_start, > (i << out_chunk_order) + j, 1) ) > { > rc = -EFAULT; > - continue; > + goto extent_error;No, nothing was undone in this case before. And _if_ the re-ordering is provably correct (which I doubt at this point), you''d need to continue to assign_pages() here to retain previous behavior. Or if you think current behavior is wrong, please send a (perhaps prerequisite) separate patch to address that, which would at once make sure we get a proper explanation of what you think is wrong.> } > > mfn = page_to_mfn(page); > guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order);As Ian already pointed out, there''s a window now where the guest has the new page(s) in its physmap, but not assigned to it yet.> > + if ( op == XENMEM_exchange_and_pin ) > + { > + if ( guest_physmap_pin_range(d, gpfn, exch.out.extent_order) ) > + { > + rc = -EFAULT; > + goto extent_error_physmap; > + } > + } > + > if ( !paging_mode_translate(d) ) > { > for ( k = 0; k < (1UL << exch.out.extent_order); k++ ) > set_gpfn_from_mfn(mfn + k, gpfn + k);Again, _if_ the error path changes are provably correct, wouldn''t this need undoing on the error path too? That wasn''t necessary before your change because we ran the handling of the current extent to completion.> + } > + > + rc = assign_pages(d, page, exch.out.extent_order, MEMF_no_refcount); > + if ( rc ) > + goto extent_error_physmap; > + > + if ( op == XENMEM_exchange_and_pin || !paging_mode_translate(d) > ) > + { > if ( __copy_to_guest_offset(exch.out.extent_start, > (i << out_chunk_order) + j, > &mfn, 1) ) > rc = -EFAULT; > } > + > + continue; > + > +extent_error_physmap: > + guest_physmap_remove_page(d, gpfn, mfn, exch.out.extent_order); > +extent_error: > + /* > + * Pages in in_chunk_list is stolen without > + * decreasing the tot_pages. If the domain is dying when > + * assign pages, we need decrease the count. For those pages > + * that has been assigned, it should be covered by > + * domain_relinquish_resources(). > + */ > + dec_count = (((1UL << exch.in.extent_order) * > + (1UL << in_chunk_order)) - > + (j * (1UL << exch.out.extent_order))); > + > + spin_lock(&d->page_alloc_lock); > + domain_adjust_tot_pages(d, -dec_count); > + drop_dom_ref = (dec_count && !d->tot_pages); > + spin_unlock(&d->page_alloc_lock); > + > + if ( drop_dom_ref ) > + put_domain(d); > + > + free_domheap_pages(page, exch.out.extent_order); > + goto dying;I don''t think moving the error handling here helps readability or such.> +static long unpin(XEN_GUEST_HANDLE_PARAM(xen_unpin_t) arg) > +{ > + int rc;Please keep function return type and return code variable type in sync - "int" would be sufficient for both here.> + unsigned long i; > + struct xen_unpin unpin; > + xen_pfn_t gpfn; > + struct domain *d; > + > + if ( copy_from_guest(&unpin, arg, 1) ) > + return -EFAULT; > + > + /* Various sanity checks. */ > + if ( /* Extent orders are sensible? */ > + (unpin.in.extent_order > MAX_ORDER) || > + /* Sizes of input list do not overflow a long? */ > + ((~0UL >> unpin.in.extent_order) < unpin.in.nr_extents) ) > + return -EFAULT; > + > + if ( !guest_handle_okay(unpin.in.extent_start, unpin.in.nr_extents) ) > + return -EFAULT; > + > + d = rcu_lock_domain_by_any_id(unpin.in.domid); > + if ( d == NULL ) > + { > + rc = -ESRCH; > + goto fail; > + } > + > + for ( i = 0; i < unpin.in.nr_extents; i++ ) > + { > + if ( unlikely(__copy_from_guest_offset( > + &gpfn, unpin.in.extent_start, i, 1)) ) > + { > + rc = -EFAULT; > + goto partial; > + } > +Want/need to make sure gpfn is suitable aligned for the passed in extent order? Depends on whether guest_physmap_unpin_range() is expected to always cope with a mis-aligned one.> --- a/xen/include/public/memory.h > +++ b/xen/include/public/memory.h > @@ -105,6 +105,7 @@ struct xen_memory_exchange { > /* > * [IN] Details of memory extents to be exchanged (GMFN bases). > * Note that @in.address_bits is ignored and unused. > + * @out.address_bits should contain the address mask for the new pages."mask"? No comment is likely better than a confusing one.> @@ -459,6 +460,52 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); > * The zero value is appropiate. > */ > > +#define XENMEM_exchange_and_pin 26 > +/* > + * This hypercall is similar to XENMEM_exchange: it takes the same > + * struct as an argument and it exchanges the pages passed in with a new > + * set of pages. The new pages are going to be "pinned": it''s guaranteed > + * that their p2m mapping won''t be changed until explicitly "unpinned". > + * The content of the exchanged pages is lost. > + * Only normal guest r/w memory can be pinned: no granted pages or > + * ballooned pages. > + * If return code is zero then @out.extent_list provides the DMA frame > + * numbers of the newly-allocated memory."DMA"? I don''t think that term is universally true across all possible architectures (but we''re in an architecture independent header here). "Machine" would probably be better (as it implies CPU perspective, whereas DMA hints at device perspective).> + * Returns zero on complete success, otherwise a negative error code: > + * -ENOSYS if not implemented > + * -EINVAL if the page is already pinned > + * -EFAULT if an internal error occursI''m not sure enumerating error codes here is useful; certainly not giving the impression that the list might be exhaustive. Jan
Ian Campbell
2013-Sep-10 12:50 UTC
Re: [PATCH v5 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
On Tue, 2013-09-10 at 13:15 +0100, Jan Beulich wrote:> > @@ -459,6 +460,52 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); > > * The zero value is appropiate. > > */ > > > > +#define XENMEM_exchange_and_pin 26 > > +/* > > + * This hypercall is similar to XENMEM_exchange: it takes the same > > + * struct as an argument and it exchanges the pages passed in with a new > > + * set of pages. The new pages are going to be "pinned": it''s guaranteed > > + * that their p2m mapping won''t be changed until explicitly "unpinned". > > + * The content of the exchanged pages is lost. > > + * Only normal guest r/w memory can be pinned: no granted pages or > > + * ballooned pages. > > + * If return code is zero then @out.extent_list provides the DMA frame > > + * numbers of the newly-allocated memory. > > "DMA"? I don''t think that term is universally true across all possible > architectures (but we''re in an architecture independent header > here). "Machine" would probably be better (as it implies CPU > perspective, whereas DMA hints at device perspective).I think DMA here is correct. The purpose of exchange and pin is so that the page can be safely handed to a device for DMA. On an architecture where DMA address != Machine address then this should indeed return the DMA addresses. Ian.
Jan Beulich
2013-Sep-10 13:28 UTC
Re: [PATCH v5 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
>>> On 10.09.13 at 14:50, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Tue, 2013-09-10 at 13:15 +0100, Jan Beulich wrote: > >> > @@ -459,6 +460,52 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); >> > * The zero value is appropiate. >> > */ >> > >> > +#define XENMEM_exchange_and_pin 26 >> > +/* >> > + * This hypercall is similar to XENMEM_exchange: it takes the same >> > + * struct as an argument and it exchanges the pages passed in with a new >> > + * set of pages. The new pages are going to be "pinned": it''s guaranteed >> > + * that their p2m mapping won''t be changed until explicitly "unpinned". >> > + * The content of the exchanged pages is lost. >> > + * Only normal guest r/w memory can be pinned: no granted pages or >> > + * ballooned pages. >> > + * If return code is zero then @out.extent_list provides the DMA frame >> > + * numbers of the newly-allocated memory. >> >> "DMA"? I don''t think that term is universally true across all possible >> architectures (but we''re in an architecture independent header >> here). "Machine" would probably be better (as it implies CPU >> perspective, whereas DMA hints at device perspective). > > I think DMA here is correct. The purpose of exchange and pin is so that > the page can be safely handed to a device for DMA. > > On an architecture where DMA address != Machine address then this should > indeed return the DMA addresses.One problem is that I think there are architectures where there''s no single canonical DMA address; such an address may depend on the placement of a device in the system''s topology. Hence I don''t think it would even be possible to return "the" DMA address here. It ought to be the machine address (CPU view), and the consumer ought to know how to translate this to a DMA address for a particular device. Jan
Stefano Stabellini
2013-Sep-10 17:20 UTC
Re: [PATCH v5 1/4] xen: move steal_page and donate_page to common code
On Tue, 10 Sep 2013, Jan Beulich wrote:> >>> On 09.09.13 at 18:06, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > > --- a/xen/common/memory.c > > +++ b/xen/common/memory.c > > @@ -230,6 +230,91 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) > > return 1; > > } > > > > +int donate_page( > > + struct domain *d, struct page_info *page, unsigned int memflags) > > +{ > > + spin_lock(&d->page_alloc_lock); > > + > > + if ( is_xen_heap_page(page) || (page_get_owner(page) != NULL) ) > > + goto fail; > > + > > + if ( d->is_dying ) > > + goto fail; > > + > > + if ( page->count_info & ~(PGC_allocated | 1) ) > > While the literal 1 was acceptable in arch specific code, I''m not sure > it really is in generic code. Nothing enforces that an arch has the > count starting at bit 0. Even if it reads a little odd, using > (-PGC_count_mask & PGC_count_mask) would probably be an > acceptable replacement here and further down.(-PGC_count_mask & PGC_count_mask) is always 0. I guess you mean: if ( page->count_info & ~(PGC_count_mask | PGC_allocated) )
Stefano Stabellini
2013-Sep-10 18:00 UTC
Re: [PATCH v5 2/4] xen/arm: introduce a generic p2m walker and use it in p2m_lookup
On Tue, 10 Sep 2013, Ian Campbell wrote:> This function is not actually all the p2m specific in the end, by using > lpae_t.walk instead of lpae_t.pt (see dump_pt_walk) you could make it > useful for e.g. hyp pagetable walks too. (e.g. dump_hyp_walk). In theory > it could also be used for LPAE guest walks too, but we''d need a separate > walker for non-LPAE guests.I could move the function somewhere more generic but..> If we wanted to replace dump_hyp_walk with this then calling the > callback for each level would be required... even though the difference is not likely to be big, still it would slow down p2m_walker, that could even be called as often as twice per DMA request. Maybe it''s best to keep it as it is?> > > > spin_lock(&p2m->lock); > > > > first = __map_domain_page(p2m->first_level); > > > > - pte = first[first_table_offset(paddr)]; > > - if ( !pte.p2m.valid || !pte.p2m.table ) > > - goto done; > > + if ( !first || > > + !first[first_table_offset(paddr)].p2m.valid ) > > + goto err; > > Why is the first level handled specially outside the loop? What happens > if order is such that we span multiple first level entries?I think that the assumption in the original code was that the size couldn''t be larger than 1GB. It isn''t the case anymore, I''ll fix this.> > - /* This bit must be one in the level 3 entry */ > > - if ( !pte.p2m.table ) > > - pte.bits = 0; > > + if ( cur_first_offset != first_table_offset(paddr) ) > > + { > > + if (second) unmap_domain_page(second); > > + second = map_domain_page(first[first_table_offset(paddr)].p2m.base); > > + cur_first_offset = first_table_offset(paddr); > > + } > > + level++; > > + if ( !second || > > ASSERT(second) seems reasonable here, I think, since it would indicate > we had screwed up the p2m pretty badly.I think it''s possible: what if a memory range is simply missing from the p2m? The second level pagetable pages could be missing too.
Tim Deegan
2013-Sep-10 18:06 UTC
Re: [PATCH v5 1/4] xen: move steal_page and donate_page to common code
At 18:20 +0100 on 10 Sep (1378837226), Stefano Stabellini wrote:> On Tue, 10 Sep 2013, Jan Beulich wrote: > > >>> On 09.09.13 at 18:06, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > > > --- a/xen/common/memory.c > > > +++ b/xen/common/memory.c > > > @@ -230,6 +230,91 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) > > > return 1; > > > } > > > > > > +int donate_page( > > > + struct domain *d, struct page_info *page, unsigned int memflags) > > > +{ > > > + spin_lock(&d->page_alloc_lock); > > > + > > > + if ( is_xen_heap_page(page) || (page_get_owner(page) != NULL) ) > > > + goto fail; > > > + > > > + if ( d->is_dying ) > > > + goto fail; > > > + > > > + if ( page->count_info & ~(PGC_allocated | 1) ) > > > > While the literal 1 was acceptable in arch specific code, I''m not sure > > it really is in generic code. Nothing enforces that an arch has the > > count starting at bit 0. Even if it reads a little odd, using > > (-PGC_count_mask & PGC_count_mask) would probably be an > > acceptable replacement here and further down. > > (-PGC_count_mask & PGC_count_mask) is always 0.''x & -x'' always gives you the lowest set bit in x, though as Jan points out it''s a bit opaque. Maybe it needs a friendly macro name? Tim.
Stefano Stabellini
2013-Sep-10 18:14 UTC
Re: [PATCH v5 1/4] xen: move steal_page and donate_page to common code
On Tue, 10 Sep 2013, Tim Deegan wrote:> At 18:20 +0100 on 10 Sep (1378837226), Stefano Stabellini wrote: > > On Tue, 10 Sep 2013, Jan Beulich wrote: > > > >>> On 09.09.13 at 18:06, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > > > > --- a/xen/common/memory.c > > > > +++ b/xen/common/memory.c > > > > @@ -230,6 +230,91 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) > > > > return 1; > > > > } > > > > > > > > +int donate_page( > > > > + struct domain *d, struct page_info *page, unsigned int memflags) > > > > +{ > > > > + spin_lock(&d->page_alloc_lock); > > > > + > > > > + if ( is_xen_heap_page(page) || (page_get_owner(page) != NULL) ) > > > > + goto fail; > > > > + > > > > + if ( d->is_dying ) > > > > + goto fail; > > > > + > > > > + if ( page->count_info & ~(PGC_allocated | 1) ) > > > > > > While the literal 1 was acceptable in arch specific code, I''m not sure > > > it really is in generic code. Nothing enforces that an arch has the > > > count starting at bit 0. Even if it reads a little odd, using > > > (-PGC_count_mask & PGC_count_mask) would probably be an > > > acceptable replacement here and further down. > > > > (-PGC_count_mask & PGC_count_mask) is always 0. > > ''x & -x'' always gives you the lowest set bit in x, though as Jan points > out it''s a bit opaque. Maybe it needs a friendly macro name?No that''s fine. I read it as ~PGC_count_mask. It proves that I need new glasses or a larger font size :P
Stefano Stabellini
2013-Sep-10 18:32 UTC
Re: [PATCH v5 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
On Tue, 10 Sep 2013, Jan Beulich wrote:> >>> On 10.09.13 at 14:50, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Tue, 2013-09-10 at 13:15 +0100, Jan Beulich wrote: > > > >> > @@ -459,6 +460,52 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); > >> > * The zero value is appropiate. > >> > */ > >> > > >> > +#define XENMEM_exchange_and_pin 26 > >> > +/* > >> > + * This hypercall is similar to XENMEM_exchange: it takes the same > >> > + * struct as an argument and it exchanges the pages passed in with a new > >> > + * set of pages. The new pages are going to be "pinned": it''s guaranteed > >> > + * that their p2m mapping won''t be changed until explicitly "unpinned". > >> > + * The content of the exchanged pages is lost. > >> > + * Only normal guest r/w memory can be pinned: no granted pages or > >> > + * ballooned pages. > >> > + * If return code is zero then @out.extent_list provides the DMA frame > >> > + * numbers of the newly-allocated memory. > >> > >> "DMA"? I don''t think that term is universally true across all possible > >> architectures (but we''re in an architecture independent header > >> here). "Machine" would probably be better (as it implies CPU > >> perspective, whereas DMA hints at device perspective). > > > > I think DMA here is correct. The purpose of exchange and pin is so that > > the page can be safely handed to a device for DMA. > > > > On an architecture where DMA address != Machine address then this should > > indeed return the DMA addresses. > > One problem is that I think there are architectures where there''s no > single canonical DMA address; such an address may depend on the > placement of a device in the system''s topology. Hence I don''t think > it would even be possible to return "the" DMA address here. It ought > to be the machine address (CPU view), and the consumer ought to > know how to translate this to a DMA address for a particular device.We could leave it up to each architecture to specify whether the hypercall returns a DMA address or a Machine address (according to their definition of DMA address and Machine address). Even if this is a common header.
Tim Deegan
2013-Sep-10 18:37 UTC
Re: [PATCH v5 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
At 19:32 +0100 on 10 Sep (1378841575), Stefano Stabellini wrote:> On Tue, 10 Sep 2013, Jan Beulich wrote: > > >>> On 10.09.13 at 14:50, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > > On Tue, 2013-09-10 at 13:15 +0100, Jan Beulich wrote: > > > > > >> > @@ -459,6 +460,52 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); > > >> > * The zero value is appropiate. > > >> > */ > > >> > > > >> > +#define XENMEM_exchange_and_pin 26 > > >> > +/* > > >> > + * This hypercall is similar to XENMEM_exchange: it takes the same > > >> > + * struct as an argument and it exchanges the pages passed in with a new > > >> > + * set of pages. The new pages are going to be "pinned": it''s guaranteed > > >> > + * that their p2m mapping won''t be changed until explicitly "unpinned". > > >> > + * The content of the exchanged pages is lost. > > >> > + * Only normal guest r/w memory can be pinned: no granted pages or > > >> > + * ballooned pages. > > >> > + * If return code is zero then @out.extent_list provides the DMA frame > > >> > + * numbers of the newly-allocated memory. > > >> > > >> "DMA"? I don''t think that term is universally true across all possible > > >> architectures (but we''re in an architecture independent header > > >> here). "Machine" would probably be better (as it implies CPU > > >> perspective, whereas DMA hints at device perspective). > > > > > > I think DMA here is correct. The purpose of exchange and pin is so that > > > the page can be safely handed to a device for DMA. > > > > > > On an architecture where DMA address != Machine address then this should > > > indeed return the DMA addresses. > > > > One problem is that I think there are architectures where there''s no > > single canonical DMA address; such an address may depend on the > > placement of a device in the system''s topology. Hence I don''t think > > it would even be possible to return "the" DMA address here. It ought > > to be the machine address (CPU view), and the consumer ought to > > know how to translate this to a DMA address for a particular device. > > We could leave it up to each architecture to specify whether the > hypercall returns a DMA address or a Machine address (according to > their definition of DMA address and Machine address). > Even if this is a common header.Is this going to be needed on architectures other than arm? It''s not useful for x86, AFAICT. Tim.
Ian Campbell
2013-Sep-10 18:38 UTC
Re: [PATCH v5 2/4] xen/arm: introduce a generic p2m walker and use it in p2m_lookup
On Tue, 2013-09-10 at 19:00 +0100, Stefano Stabellini wrote:> > > - /* This bit must be one in the level 3 entry */ > > > - if ( !pte.p2m.table ) > > > - pte.bits = 0; > > > + if ( cur_first_offset != first_table_offset(paddr) ) > > > + { > > > + if (second) unmap_domain_page(second); > > > + second = map_domain_page(first[first_table_offset(paddr)].p2m.base); > > > + cur_first_offset = first_table_offset(paddr); > > > + } > > > + level++; > > > + if ( !second || > > > > ASSERT(second) seems reasonable here, I think, since it would indicate > > we had screwed up the p2m pretty badly. > > I think it''s possible: what if a memory range is simply missing from the > p2m? The second level pagetable pages could be missing too.Then the level above it shouldn''t have the valid bit set. Ian.
Ian Campbell
2013-Sep-10 18:40 UTC
Re: [PATCH v5 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
On Tue, 2013-09-10 at 19:37 +0100, Tim Deegan wrote:> At 19:32 +0100 on 10 Sep (1378841575), Stefano Stabellini wrote: > > On Tue, 10 Sep 2013, Jan Beulich wrote: > > > >>> On 10.09.13 at 14:50, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > > > On Tue, 2013-09-10 at 13:15 +0100, Jan Beulich wrote: > > > > > > > >> > @@ -459,6 +460,52 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); > > > >> > * The zero value is appropiate. > > > >> > */ > > > >> > > > > >> > +#define XENMEM_exchange_and_pin 26 > > > >> > +/* > > > >> > + * This hypercall is similar to XENMEM_exchange: it takes the same > > > >> > + * struct as an argument and it exchanges the pages passed in with a new > > > >> > + * set of pages. The new pages are going to be "pinned": it''s guaranteed > > > >> > + * that their p2m mapping won''t be changed until explicitly "unpinned". > > > >> > + * The content of the exchanged pages is lost. > > > >> > + * Only normal guest r/w memory can be pinned: no granted pages or > > > >> > + * ballooned pages. > > > >> > + * If return code is zero then @out.extent_list provides the DMA frame > > > >> > + * numbers of the newly-allocated memory. > > > >> > > > >> "DMA"? I don''t think that term is universally true across all possible > > > >> architectures (but we''re in an architecture independent header > > > >> here). "Machine" would probably be better (as it implies CPU > > > >> perspective, whereas DMA hints at device perspective). > > > > > > > > I think DMA here is correct. The purpose of exchange and pin is so that > > > > the page can be safely handed to a device for DMA. > > > > > > > > On an architecture where DMA address != Machine address then this should > > > > indeed return the DMA addresses. > > > > > > One problem is that I think there are architectures where there''s no > > > single canonical DMA address; such an address may depend on the > > > placement of a device in the system''s topology. Hence I don''t think > > > it would even be possible to return "the" DMA address here. It ought > > > to be the machine address (CPU view), and the consumer ought to > > > know how to translate this to a DMA address for a particular device. > > > > We could leave it up to each architecture to specify whether the > > hypercall returns a DMA address or a Machine address (according to > > their definition of DMA address and Machine address). > > Even if this is a common header. > > Is this going to be needed on architectures other than arm? It''s not > useful for x86, AFAICT.It might be needed for PVH with passthrough? Maybe that insists on an IOMMU, in which case maybe it is arm only. Until someone does e.g. a ppc port ;-) Ian.
Tim Deegan
2013-Sep-10 18:50 UTC
Re: [PATCH v5 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
At 19:40 +0100 on 10 Sep (1378842048), Ian Campbell wrote:> On Tue, 2013-09-10 at 19:37 +0100, Tim Deegan wrote: > > At 19:32 +0100 on 10 Sep (1378841575), Stefano Stabellini wrote: > > > On Tue, 10 Sep 2013, Jan Beulich wrote: > > > > >>> On 10.09.13 at 14:50, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > > > > On Tue, 2013-09-10 at 13:15 +0100, Jan Beulich wrote: > > > > > > > > > >> > @@ -459,6 +460,52 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); > > > > >> > * The zero value is appropiate. > > > > >> > */ > > > > >> > > > > > >> > +#define XENMEM_exchange_and_pin 26 > > > > >> > +/* > > > > >> > + * This hypercall is similar to XENMEM_exchange: it takes the same > > > > >> > + * struct as an argument and it exchanges the pages passed in with a new > > > > >> > + * set of pages. The new pages are going to be "pinned": it''s guaranteed > > > > >> > + * that their p2m mapping won''t be changed until explicitly "unpinned". > > > > >> > + * The content of the exchanged pages is lost. > > > > >> > + * Only normal guest r/w memory can be pinned: no granted pages or > > > > >> > + * ballooned pages. > > > > >> > + * If return code is zero then @out.extent_list provides the DMA frame > > > > >> > + * numbers of the newly-allocated memory. > > > > >> > > > > >> "DMA"? I don''t think that term is universally true across all possible > > > > >> architectures (but we''re in an architecture independent header > > > > >> here). "Machine" would probably be better (as it implies CPU > > > > >> perspective, whereas DMA hints at device perspective). > > > > > > > > > > I think DMA here is correct. The purpose of exchange and pin is so that > > > > > the page can be safely handed to a device for DMA. > > > > > > > > > > On an architecture where DMA address != Machine address then this should > > > > > indeed return the DMA addresses. > > > > > > > > One problem is that I think there are architectures where there''s no > > > > single canonical DMA address; such an address may depend on the > > > > placement of a device in the system''s topology. Hence I don''t think > > > > it would even be possible to return "the" DMA address here. It ought > > > > to be the machine address (CPU view), and the consumer ought to > > > > know how to translate this to a DMA address for a particular device. > > > > > > We could leave it up to each architecture to specify whether the > > > hypercall returns a DMA address or a Machine address (according to > > > their definition of DMA address and Machine address). > > > Even if this is a common header. > > > > Is this going to be needed on architectures other than arm? It''s not > > useful for x86, AFAICT. > > It might be needed for PVH with passthrough? Maybe that insists on an > IOMMU, in which case maybe it is arm only.Yes, I think it would be fine require a working IOMMU on x86 for PVH+passthrough.> Until someone does e.g. a ppc port ;-)m68k! Tim.
Stefano Stabellini
2013-Sep-10 19:11 UTC
Re: [PATCH v5 3/4] xen: implement guest_physmap_pin_range and guest_physmap_unpin_range
On Tue, 10 Sep 2013, Ian Campbell wrote:> On Mon, 2013-09-09 at 17:06 +0100, Stefano Stabellini wrote: > > guest_physmap_pin_range pins a range of guest pages so that their p2m > > mappings won''t be changed. > > guest_physmap_unpin_range unpins the previously pinned pages. > > The pinning is done using one of the spare bits in the p2m ptes. > > > > Use the newly introduce p2m_walker to implement the two functions on > > ARM. > > Provide empty stubs for x86. > > > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > > > > > Changes in v5: > > - return -EBUSY when the P2M_DMA_PIN check fails; > > - rename _guest_physmap_pin_range to pin_one_pte; > > - rename _guest_physmap_unpin_range to unpin_one_pte. > > > > Changes in v4: > > - use p2m_walker to implement guest_physmap_pin_range and > > guest_physmap_unpin_range; > > - return -EINVAL when the P2M_DMA_PIN check fails; > > - change the printk into a gdprintk; > > - add a comment on what type of page can be pinned. > > --- > > xen/arch/arm/p2m.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ > > xen/include/asm-arm/mm.h | 4 +++ > > xen/include/asm-arm/page.h | 7 +++++- > > xen/include/asm-x86/p2m.h | 12 +++++++++++ > > 4 files changed, 70 insertions(+), 1 deletions(-) > > > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > > index a9ceacf..bac6c7e 100644 > > --- a/xen/arch/arm/p2m.c > > +++ b/xen/arch/arm/p2m.c > > @@ -154,6 +154,46 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr) > > return p2m.maddr; > > } > > > > +static int pin_one_pte(lpae_t *ptep, void *arg, int level) > > +{ > > + lpae_t pte = *ptep; > > + ASSERT(level == 3); > > + > > + if ( pte.p2m.avail & P2M_DMA_PIN ) > > + return -EBUSY; > > + pte.p2m.avail |= P2M_DMA_PIN; > > + write_pte(ptep, pte); > > + return 0; > > +} > > + > > +int guest_physmap_pin_range(struct domain *d, > > + xen_pfn_t gpfn, > > + unsigned int order) > > +{ > > + return p2m_walker(d, gpfn << PAGE_SHIFT, order, > > + pin_one_pte, NULL); > > If this fails then you will have left some subset of the pages > successfully pinned. You need to clean it up I think, or be fatal to the > guest or something.Making the failure of guest_physmap_pin_range fatal to the guest would also solve the non-trivial problem of error handling in memory_exchange. Jan, do you agree on this approach?
Ian Campbell
2013-Sep-10 19:42 UTC
Re: [PATCH v5 3/4] xen: implement guest_physmap_pin_range and guest_physmap_unpin_range
On Tue, 2013-09-10 at 20:11 +0100, Stefano Stabellini wrote:> On Tue, 10 Sep 2013, Ian Campbell wrote: > > On Mon, 2013-09-09 at 17:06 +0100, Stefano Stabellini wrote: > > > guest_physmap_pin_range pins a range of guest pages so that their p2m > > > mappings won''t be changed. > > > guest_physmap_unpin_range unpins the previously pinned pages. > > > The pinning is done using one of the spare bits in the p2m ptes. > > > > > > Use the newly introduce p2m_walker to implement the two functions on > > > ARM. > > > Provide empty stubs for x86. > > > > > > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > > > > > > > > Changes in v5: > > > - return -EBUSY when the P2M_DMA_PIN check fails; > > > - rename _guest_physmap_pin_range to pin_one_pte; > > > - rename _guest_physmap_unpin_range to unpin_one_pte. > > > > > > Changes in v4: > > > - use p2m_walker to implement guest_physmap_pin_range and > > > guest_physmap_unpin_range; > > > - return -EINVAL when the P2M_DMA_PIN check fails; > > > - change the printk into a gdprintk; > > > - add a comment on what type of page can be pinned. > > > --- > > > xen/arch/arm/p2m.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ > > > xen/include/asm-arm/mm.h | 4 +++ > > > xen/include/asm-arm/page.h | 7 +++++- > > > xen/include/asm-x86/p2m.h | 12 +++++++++++ > > > 4 files changed, 70 insertions(+), 1 deletions(-) > > > > > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > > > index a9ceacf..bac6c7e 100644 > > > --- a/xen/arch/arm/p2m.c > > > +++ b/xen/arch/arm/p2m.c > > > @@ -154,6 +154,46 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr) > > > return p2m.maddr; > > > } > > > > > > +static int pin_one_pte(lpae_t *ptep, void *arg, int level) > > > +{ > > > + lpae_t pte = *ptep; > > > + ASSERT(level == 3); > > > + > > > + if ( pte.p2m.avail & P2M_DMA_PIN ) > > > + return -EBUSY; > > > + pte.p2m.avail |= P2M_DMA_PIN; > > > + write_pte(ptep, pte); > > > + return 0; > > > +} > > > + > > > +int guest_physmap_pin_range(struct domain *d, > > > + xen_pfn_t gpfn, > > > + unsigned int order) > > > +{ > > > + return p2m_walker(d, gpfn << PAGE_SHIFT, order, > > > + pin_one_pte, NULL); > > > > If this fails then you will have left some subset of the pages > > successfully pinned. You need to clean it up I think, or be fatal to the > > guest or something. > > Making the failure of guest_physmap_pin_range fatal to the guest would > also solve the non-trivial problem of error handling in memory_exchange.Actually, I don''t think this will work. Consider two concurrent I/O operations dispatched by two threads in the same process, to/from non-overlapping regions of the same page. Hrm, we may actually need a reference count on the pin :-( We might need to rethink here -- i.e. use a type count on the underlying machine page rather than a pin in the p2m entry? That kind of makes sense since it will be the underlying page which is exposed to the hardware... Ian.
Jan Beulich
2013-Sep-11 06:31 UTC
Re: [PATCH v5 1/4] xen: move steal_page and donate_page to common code
>>> On 10.09.13 at 19:20, Stefano Stabellini <stefano.stabellini@eu.citrix.com>wrote:> On Tue, 10 Sep 2013, Jan Beulich wrote: >> >>> On 09.09.13 at 18:06, Stefano Stabellini <stefano.stabellini@eu.citrix.com> > wrote: >> > --- a/xen/common/memory.c >> > +++ b/xen/common/memory.c >> > @@ -230,6 +230,91 @@ int guest_remove_page(struct domain *d, unsigned long > gmfn) >> > return 1; >> > } >> > >> > +int donate_page( >> > + struct domain *d, struct page_info *page, unsigned int memflags) >> > +{ >> > + spin_lock(&d->page_alloc_lock); >> > + >> > + if ( is_xen_heap_page(page) || (page_get_owner(page) != NULL) ) >> > + goto fail; >> > + >> > + if ( d->is_dying ) >> > + goto fail; >> > + >> > + if ( page->count_info & ~(PGC_allocated | 1) ) >> >> While the literal 1 was acceptable in arch specific code, I''m not sure >> it really is in generic code. Nothing enforces that an arch has the >> count starting at bit 0. Even if it reads a little odd, using >> (-PGC_count_mask & PGC_count_mask) would probably be an >> acceptable replacement here and further down. > > (-PGC_count_mask & PGC_count_mask) is always 0. > > I guess you mean: > > if ( page->count_info & ~(PGC_count_mask | PGC_allocated) )No, I really meant what I wrote, resulting in if ( page->count_info & ~(PGC_allocated | (-PGC_count_mask & PGC_count_mask)) ) Jan
Jan Beulich
2013-Sep-11 06:33 UTC
Re: [PATCH v5 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
>>> On 10.09.13 at 20:32, Stefano Stabellini <stefano.stabellini@eu.citrix.com>wrote:> On Tue, 10 Sep 2013, Jan Beulich wrote: >> >>> On 10.09.13 at 14:50, Ian Campbell <Ian.Campbell@citrix.com> wrote: >> > On Tue, 2013-09-10 at 13:15 +0100, Jan Beulich wrote: >> > >> >> > @@ -459,6 +460,52 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); >> >> > * The zero value is appropiate. >> >> > */ >> >> > >> >> > +#define XENMEM_exchange_and_pin 26 >> >> > +/* >> >> > + * This hypercall is similar to XENMEM_exchange: it takes the same >> >> > + * struct as an argument and it exchanges the pages passed in with a new >> >> > + * set of pages. The new pages are going to be "pinned": it''s guaranteed >> >> > + * that their p2m mapping won''t be changed until explicitly "unpinned". >> >> > + * The content of the exchanged pages is lost. >> >> > + * Only normal guest r/w memory can be pinned: no granted pages or >> >> > + * ballooned pages. >> >> > + * If return code is zero then @out.extent_list provides the DMA frame >> >> > + * numbers of the newly-allocated memory. >> >> >> >> "DMA"? I don''t think that term is universally true across all possible >> >> architectures (but we''re in an architecture independent header >> >> here). "Machine" would probably be better (as it implies CPU >> >> perspective, whereas DMA hints at device perspective). >> > >> > I think DMA here is correct. The purpose of exchange and pin is so that >> > the page can be safely handed to a device for DMA. >> > >> > On an architecture where DMA address != Machine address then this should >> > indeed return the DMA addresses. >> >> One problem is that I think there are architectures where there''s no >> single canonical DMA address; such an address may depend on the >> placement of a device in the system''s topology. Hence I don''t think >> it would even be possible to return "the" DMA address here. It ought >> to be the machine address (CPU view), and the consumer ought to >> know how to translate this to a DMA address for a particular device. > > We could leave it up to each architecture to specify whether the > hypercall returns a DMA address or a Machine address (according to > their definition of DMA address and Machine address). > Even if this is a common header.That would likely result in ugly conditionals in the kernel side code consuming this. But yes, considering that we''re not currently aware of ports to such architectures (I have no idea how MIPS works in this regard; that''s the only architecture I''m aware people have shown some interest in), we might as well do it that way. But then please change the comment so say so. Jan
Stefano Stabellini
2013-Sep-11 16:55 UTC
Re: [PATCH v5 2/4] xen/arm: introduce a generic p2m walker and use it in p2m_lookup
On Tue, 10 Sep 2013, Ian Campbell wrote:> On Tue, 2013-09-10 at 19:00 +0100, Stefano Stabellini wrote: > > > > - /* This bit must be one in the level 3 entry */ > > > > - if ( !pte.p2m.table ) > > > > - pte.bits = 0; > > > > + if ( cur_first_offset != first_table_offset(paddr) ) > > > > + { > > > > + if (second) unmap_domain_page(second); > > > > + second = map_domain_page(first[first_table_offset(paddr)].p2m.base); > > > > + cur_first_offset = first_table_offset(paddr); > > > > + } > > > > + level++; > > > > + if ( !second || > > > > > > ASSERT(second) seems reasonable here, I think, since it would indicate > > > we had screwed up the p2m pretty badly. > > > > I think it''s possible: what if a memory range is simply missing from the > > p2m? The second level pagetable pages could be missing too. > > Then the level above it shouldn''t have the valid bit set.Oh, right!
Stefano Stabellini
2013-Sep-27 14:46 UTC
Re: [PATCH v5 3/4] xen: implement guest_physmap_pin_range and guest_physmap_unpin_range
On Tue, 10 Sep 2013, Ian Campbell wrote:> On Tue, 2013-09-10 at 20:11 +0100, Stefano Stabellini wrote: > > On Tue, 10 Sep 2013, Ian Campbell wrote: > > > On Mon, 2013-09-09 at 17:06 +0100, Stefano Stabellini wrote: > > > > guest_physmap_pin_range pins a range of guest pages so that their p2m > > > > mappings won''t be changed. > > > > guest_physmap_unpin_range unpins the previously pinned pages. > > > > The pinning is done using one of the spare bits in the p2m ptes. > > > > > > > > Use the newly introduce p2m_walker to implement the two functions on > > > > ARM. > > > > Provide empty stubs for x86. > > > > > > > > > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > > > > > > > > > > > Changes in v5: > > > > - return -EBUSY when the P2M_DMA_PIN check fails; > > > > - rename _guest_physmap_pin_range to pin_one_pte; > > > > - rename _guest_physmap_unpin_range to unpin_one_pte. > > > > > > > > Changes in v4: > > > > - use p2m_walker to implement guest_physmap_pin_range and > > > > guest_physmap_unpin_range; > > > > - return -EINVAL when the P2M_DMA_PIN check fails; > > > > - change the printk into a gdprintk; > > > > - add a comment on what type of page can be pinned. > > > > --- > > > > xen/arch/arm/p2m.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ > > > > xen/include/asm-arm/mm.h | 4 +++ > > > > xen/include/asm-arm/page.h | 7 +++++- > > > > xen/include/asm-x86/p2m.h | 12 +++++++++++ > > > > 4 files changed, 70 insertions(+), 1 deletions(-) > > > > > > > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > > > > index a9ceacf..bac6c7e 100644 > > > > --- a/xen/arch/arm/p2m.c > > > > +++ b/xen/arch/arm/p2m.c > > > > @@ -154,6 +154,46 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr) > > > > return p2m.maddr; > > > > } > > > > > > > > +static int pin_one_pte(lpae_t *ptep, void *arg, int level) > > > > +{ > > > > + lpae_t pte = *ptep; > > > > + ASSERT(level == 3); > > > > + > > > > + if ( pte.p2m.avail & P2M_DMA_PIN ) > > > > + return -EBUSY; > > > > + pte.p2m.avail |= P2M_DMA_PIN; > > > > + write_pte(ptep, pte); > > > > + return 0; > > > > +} > > > > + > > > > +int guest_physmap_pin_range(struct domain *d, > > > > + xen_pfn_t gpfn, > > > > + unsigned int order) > > > > +{ > > > > + return p2m_walker(d, gpfn << PAGE_SHIFT, order, > > > > + pin_one_pte, NULL); > > > > > > If this fails then you will have left some subset of the pages > > > successfully pinned. You need to clean it up I think, or be fatal to the > > > guest or something. > > > > Making the failure of guest_physmap_pin_range fatal to the guest would > > also solve the non-trivial problem of error handling in memory_exchange. > > Actually, I don''t think this will work. > > Consider two concurrent I/O operations dispatched by two threads in the > same process, to/from non-overlapping regions of the same page. > > Hrm, we may actually need a reference count on the pin :-( > > We might need to rethink here -- i.e. use a type count on the underlying > machine page rather than a pin in the p2m entry? That kind of makes > sense since it will be the underlying page which is exposed to the > hardware...I think that we should only allow pinning at page granularity level. Also I think it should be the guest to do the reference counting: it needs to keep track of the mfn_to_pfn mappings anyway, it might as well keep track of the references too, so that it can avoid issuing an hypercall to pin a page that is already pinned.
Stefano Stabellini
2013-Sep-27 14:49 UTC
Re: [PATCH v5 4/4] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
On Tue, 10 Sep 2013, Jan Beulich wrote:> > + unsigned long i; > > + struct xen_unpin unpin; > > + xen_pfn_t gpfn; > > + struct domain *d; > > + > > + if ( copy_from_guest(&unpin, arg, 1) ) > > + return -EFAULT; > > + > > + /* Various sanity checks. */ > > + if ( /* Extent orders are sensible? */ > > + (unpin.in.extent_order > MAX_ORDER) || > > + /* Sizes of input list do not overflow a long? */ > > + ((~0UL >> unpin.in.extent_order) < unpin.in.nr_extents) ) > > + return -EFAULT; > > + > > + if ( !guest_handle_okay(unpin.in.extent_start, unpin.in.nr_extents) ) > > + return -EFAULT; > > + > > + d = rcu_lock_domain_by_any_id(unpin.in.domid); > > + if ( d == NULL ) > > + { > > + rc = -ESRCH; > > + goto fail; > > + } > > + > > + for ( i = 0; i < unpin.in.nr_extents; i++ ) > > + { > > + if ( unlikely(__copy_from_guest_offset( > > + &gpfn, unpin.in.extent_start, i, 1)) ) > > + { > > + rc = -EFAULT; > > + goto partial; > > + } > > + > > Want/need to make sure gpfn is suitable aligned for the passed in > extent order? Depends on whether guest_physmap_unpin_range() > is expected to always cope with a mis-aligned one.I think it should be able to cope with non order aligned requests.
Stefano Stabellini
2013-Sep-27 15:23 UTC
Re: [PATCH v5 3/4] xen: implement guest_physmap_pin_range and guest_physmap_unpin_range
On Tue, 10 Sep 2013, Ian Campbell wrote:> On Mon, 2013-09-09 at 17:06 +0100, Stefano Stabellini wrote: > > > > +static int pin_one_pte(lpae_t *ptep, void *arg, int level) > > +{ > > + lpae_t pte = *ptep; > > + ASSERT(level == 3); > > + > > + if ( pte.p2m.avail & P2M_DMA_PIN ) > > + return -EBUSY; > > + pte.p2m.avail |= P2M_DMA_PIN; > > + write_pte(ptep, pte); > > + return 0; > > +} > > + > > +int guest_physmap_pin_range(struct domain *d, > > + xen_pfn_t gpfn, > > + unsigned int order) > > +{ > > + return p2m_walker(d, gpfn << PAGE_SHIFT, order, > > + pin_one_pte, NULL); > > Did we not also discuss accounting and limits on the amount of memory a > guest can lock down?I am thinking to allow hardware domains (is_hardware_domain(d)) to pin any number of pages, while preveting any other domains from doing it.
Ian Campbell
2013-Sep-27 15:27 UTC
Re: [PATCH v5 3/4] xen: implement guest_physmap_pin_range and guest_physmap_unpin_range
On Fri, 2013-09-27 at 16:23 +0100, Stefano Stabellini wrote:> On Tue, 10 Sep 2013, Ian Campbell wrote: > > On Mon, 2013-09-09 at 17:06 +0100, Stefano Stabellini wrote: > > > > > > +static int pin_one_pte(lpae_t *ptep, void *arg, int level) > > > +{ > > > + lpae_t pte = *ptep; > > > + ASSERT(level == 3); > > > + > > > + if ( pte.p2m.avail & P2M_DMA_PIN ) > > > + return -EBUSY; > > > + pte.p2m.avail |= P2M_DMA_PIN; > > > + write_pte(ptep, pte); > > > + return 0; > > > +} > > > + > > > +int guest_physmap_pin_range(struct domain *d, > > > + xen_pfn_t gpfn, > > > + unsigned int order) > > > +{ > > > + return p2m_walker(d, gpfn << PAGE_SHIFT, order, > > > + pin_one_pte, NULL); > > > > Did we not also discuss accounting and limits on the amount of memory a > > guest can lock down? > > I am thinking to allow hardware domains (is_hardware_domain(d)) to pin > any number of pages, while preveting any other domains from doing it.That would work. FWIW XENMEM_exchange uses multipage_allocation_permitted which allows any guest to allocate 2MB contiguous chunks and domains with a rangeset (i.e. dom0 or passthrough) to allocate whichever chunk they want (or at least lets them try). I''m not sure that pinning and exchanging are analogous though, you blanket privileged only rule will suffice until we implement passthrough. Ian.