Stefano Stabellini
2013-Aug-05 16:40 UTC
[PATCH v3 0/3] introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
Hi all, this patch series introduces two new hypercalls to allow autotranslate guests to allocate a contiguous buffer in machine addresses. The XENMEM_get_dma_buf returns the mfns and makes sure to pin the pages so that the hypervisor won''t change their p2m mappings while in use. XENMEM_put_dma_buf simply unpins the pages. Cheers, Stefano Changes in v2: - actually don''t print the warning more than once. Changes in v3: - implement guest_physmap_pin_range and guest_physmap_unpin_range on ARM; - implement XENMEM_put_dma_buf. Stefano Stabellini (3): xen/arm: implement steal_page xen: implement guest_physmap_(un)pin_range xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf xen/arch/arm/mm.c | 43 ++++++++++++++ xen/arch/arm/p2m.c | 131 +++++++++++++++++++++++++++++++++++++++++++ xen/common/memory.c | 73 +++++++++++++++++++++++- xen/include/asm-arm/mm.h | 4 + xen/include/asm-arm/page.h | 6 ++- xen/include/asm-x86/p2m.h | 12 ++++ xen/include/public/memory.h | 64 +++++++++++++++++++++ 7 files changed, 329 insertions(+), 4 deletions(-) git://xenbits.xen.org/people/sstabellini/xen-unstable.git get_dma_buf_3
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- xen/arch/arm/mm.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 43 insertions(+), 0 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index f301e65..ea64c03 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -751,6 +751,49 @@ int donate_page(struct domain *d, struct page_info *page, unsigned int memflags) int steal_page( struct domain *d, struct page_info *page, unsigned int memflags) { + unsigned long x, y; + bool_t drop_dom_ref = 0; + + spin_lock(&d->page_alloc_lock); + + if ( is_xen_heap_page(page) || (page_get_owner(page) != d) ) + goto fail; + + /* + * We require there is just one reference (PGC_allocated). We temporarily + * drop this reference now so that we can safely swizzle the owner. + */ + y = page->count_info; + do { + x = y; + if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) ) + goto fail; + y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask); + } while ( y != x ); + + /* Swizzle the owner then reinstate the PGC_allocated reference. */ + page_set_owner(page, NULL); + y = page->count_info; + do { + x = y; + BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated); + } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x ); + + /* Unlink from original owner. */ + if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) ) + drop_dom_ref = 1; + page_list_del(page, &d->page_list); + + spin_unlock(&d->page_alloc_lock); + if ( unlikely(drop_dom_ref) ) + put_domain(d); + return 0; + + fail: + spin_unlock(&d->page_alloc_lock); + printk("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%lx\n", + (void *)page_to_mfn(page), d, d->domain_id, + page_get_owner(page), page->count_info, page->u.inuse.type_info); return -1; } -- 1.7.2.5
Stefano Stabellini
2013-Aug-05 16:40 UTC
[PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range
guest_physmap_pin_range pins a range of guest pages so that they p2m mappings won''t be changed. guest_physmap_unpin_range unpins the previously pinned pages. The pinning is done using one of the spare bits in the p2m ptes. Provide empty stubs for x86. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- xen/arch/arm/p2m.c | 131 ++++++++++++++++++++++++++++++++++++++++++++ xen/include/asm-arm/mm.h | 4 + xen/include/asm-arm/page.h | 6 ++- xen/include/asm-x86/p2m.h | 12 ++++ 4 files changed, 152 insertions(+), 1 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 307c6d4..7543ca8 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -75,6 +75,129 @@ done: return maddr; } + +int guest_physmap_pin_range(struct domain *d, + xen_pfn_t gpfn, + unsigned int order) +{ + lpae_t *first = NULL, *second = NULL, *third = NULL, pte; + paddr_t addr = gpfn << PAGE_SHIFT; + struct p2m_domain *p2m = &d->arch.p2m; + int rc = -EFAULT, i; + unsigned long cur_first_offset = ~0, cur_second_offset = ~0; + + spin_lock(&p2m->lock); + + first = __map_domain_page(p2m->first_level); + + if ( !first || + !first[first_table_offset(addr)].p2m.valid || + !first[first_table_offset(addr)].p2m.table ) + goto err; + + for ( i = 0; i < (1UL << order); i++ ) + { + if ( cur_first_offset != first_table_offset(addr) ) + { + if (second) unmap_domain_page(second); + second = map_domain_page(first[first_table_offset(addr)].p2m.base); + cur_first_offset = first_table_offset(addr); + } + if ( !second || + !second[second_table_offset(addr)].p2m.valid || + !second[second_table_offset(addr)].p2m.table ) + goto err; + + if ( cur_second_offset != second_table_offset(addr) ) + { + if (third) unmap_domain_page(third); + third = map_domain_page(second[second_table_offset(addr)].p2m.base); + cur_second_offset = second_table_offset(addr); + } + if ( !third || + !third[third_table_offset(addr)].p2m.valid || + third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN ) + goto err; + + pte = third[third_table_offset(addr)]; + pte.p2m.avail |= P2M_DMA_PIN; + write_pte(&third[third_table_offset(addr)], pte); + + addr += PAGE_SIZE; + } + + rc = 0; + +err: + if ( second ) unmap_domain_page(second); + if ( first ) unmap_domain_page(first); + + spin_unlock(&p2m->lock); + + return rc; +} + +int guest_physmap_unpin_range(struct domain *d, + xen_pfn_t gpfn, + unsigned int order) +{ + lpae_t *first = NULL, *second = NULL, *third = NULL, pte; + paddr_t addr = gpfn << PAGE_SHIFT; + struct p2m_domain *p2m = &d->arch.p2m; + int rc = -EFAULT, i; + unsigned long cur_first_offset = ~0, cur_second_offset = ~0; + + spin_lock(&p2m->lock); + + first = __map_domain_page(p2m->first_level); + + if ( !first || + !first[first_table_offset(addr)].p2m.valid || + !first[first_table_offset(addr)].p2m.table ) + goto err; + + for ( i = 0; i < (1UL << order); i++ ) + { + if ( cur_first_offset != first_table_offset(addr) ) + { + if (second) unmap_domain_page(second); + second = map_domain_page(first[first_table_offset(addr)].p2m.base); + cur_first_offset = first_table_offset(addr); + } + if ( !second || + !second[second_table_offset(addr)].p2m.valid || + !second[second_table_offset(addr)].p2m.table ) + goto err; + + if ( cur_second_offset != second_table_offset(addr) ) + { + if (third) unmap_domain_page(third); + third = map_domain_page(second[second_table_offset(addr)].p2m.base); + cur_second_offset = second_table_offset(addr); + } + if ( !third || + !third[third_table_offset(addr)].p2m.valid || + !(third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN) ) + goto err; + + pte = third[third_table_offset(addr)]; + pte.p2m.avail &= ~P2M_DMA_PIN; + write_pte(&third[third_table_offset(addr)], pte); + + addr += PAGE_SIZE; + } + + rc = 0; + +err: + if ( second ) unmap_domain_page(second); + if ( first ) unmap_domain_page(first); + + spin_unlock(&p2m->lock); + + return rc; +} + int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, unsigned int order) @@ -184,6 +307,14 @@ static int create_p2m_entries(struct domain *d, cur_second_offset = second_table_offset(addr); } + if ( third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN ) + { + rc = -EINVAL; + printk("%s: cannot change p2m mapping for paddr=%"PRIpaddr + " domid=%d, the page is pinned\n", __func__, addr, d->domain_id); + goto out; + } + flush = third[third_table_offset(addr)].p2m.valid; /* Allocate a new RAM page and attach */ diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 5e7c5a3..8db8816 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -319,6 +319,10 @@ void free_init_memory(void); int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, unsigned int order); +int guest_physmap_pin_range(struct domain *d, xen_pfn_t gpfn, + unsigned int order); +int guest_physmap_unpin_range(struct domain *d, xen_pfn_t gpfn, + unsigned int order); extern void put_page_type(struct page_info *page); static inline void put_page_and_type(struct page_info *page) diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h index 41e9eff..12087d0 100644 --- a/xen/include/asm-arm/page.h +++ b/xen/include/asm-arm/page.h @@ -153,11 +153,15 @@ typedef struct { unsigned long hint:1; /* In a block of 16 contiguous entries */ unsigned long sbz2:1; unsigned long xn:1; /* eXecute-Never */ - unsigned long avail:4; /* Ignored by hardware */ + unsigned long avail:4; /* Ignored by hardware, see below */ unsigned long sbz1:5; } __attribute__((__packed__)) lpae_p2m_t; +/* Xen "avail" bits allocation in third level entries */ +#define P2M_DMA_PIN (1<<0) /* The page has been "pinned": the hypervisor + promises not to change the p2m mapping. */ + /* * Walk is the common bits of p2m and pt entries which are needed to * simply walk the table (e.g. for debug). diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 43583b2..afc7738 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -492,6 +492,18 @@ void guest_physmap_remove_page(struct domain *d, /* Set a p2m range as populate-on-demand */ int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, unsigned int order); +static inline int guest_physmap_pin_range(struct domain *d, + xen_pfn_t gpfn, + unsigned int order) +{ + return -ENOSYS; +} +int guest_physmap_unpin_range(struct domain *d, + xen_pfn_t gpfn, + unsigned int order) +{ + return -ENOSYS; +} /* Change types across all p2m entries in a domain */ void p2m_change_entry_type_global(struct domain *d, -- 1.7.2.5
Stefano Stabellini
2013-Aug-05 16:40 UTC
[PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
Introduce two new hypercalls XENMEM_get_dma_buf and XENMEM_put_dma_buf. XENMEM_get_dma_buf exchanges a set of pages for a new set, contiguous and under 4G if so requested. The new pages are going to be "pinned" so that their p2m mapping won''t be changed, until XENMEM_put_dma_buf is called. XENMEM_get_dma_buf returns the MFNs of the new pages to the caller. The only effect of XENMEM_put_dma_buf is to "unpin" the previously pinned pages. Afterwards the p2m mappings can be transparently changed by the hypervisor as normal. The memory remains accessible from the guest. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- xen/common/memory.c | 73 +++++++++++++++++++++++++++++++++++++++++-- xen/include/public/memory.h | 64 +++++++++++++++++++++++++++++++++++++ 2 files changed, 134 insertions(+), 3 deletions(-) diff --git a/xen/common/memory.c b/xen/common/memory.c index 50b740f..3e1db2a 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -279,7 +279,7 @@ static void decrease_reservation(struct memop_args *a) a->nr_done = i; } -static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) +static long memory_exchange(int op, XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) { struct xen_memory_exchange exch; PAGE_LIST_HEAD(in_chunk_list); @@ -496,7 +496,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) mfn = page_to_mfn(page); guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order); - if ( !paging_mode_translate(d) ) + if ( op == XENMEM_get_dma_buf || !paging_mode_translate(d) ) { for ( k = 0; k < (1UL << exch.out.extent_order); k++ ) set_gpfn_from_mfn(mfn + k, gpfn + k); @@ -505,6 +505,18 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) &mfn, 1) ) rc = -EFAULT; } + + if ( op == XENMEM_get_dma_buf ) + { + static int warning; + if ( guest_physmap_pin_range(d, gpfn, exch.out.extent_order) ) + { + if (!warning) { + gdprintk(XENLOG_WARNING, "guest_physmap_pin_range not implemented\n"); + warning = 1; + } + } + } } BUG_ON( !(d->is_dying) && (j != (1UL << out_chunk_order)) ); } @@ -542,6 +554,54 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) return rc; } +static long put_dma_buf(XEN_GUEST_HANDLE_PARAM(xen_put_dma_buf_t) arg) +{ + int rc, i; + struct xen_put_dma_buf put_dma_buf; + xen_pfn_t gpfn; + struct domain *d; + + if ( copy_from_guest(&put_dma_buf, arg, 1) ) + return -EFAULT; + + /* Various sanity checks. */ + if ( /* Extent orders are sensible? */ + (put_dma_buf.in.extent_order > MAX_ORDER) || + /* Sizes of input list do not overflow a long? */ + ((~0UL >> put_dma_buf.in.extent_order) < put_dma_buf.in.nr_extents) ) + return -EFAULT; + + if ( !guest_handle_okay(put_dma_buf.in.extent_start, put_dma_buf.in.nr_extents) ) + return -EFAULT; + + d = rcu_lock_domain_by_any_id(put_dma_buf.in.domid); + if ( d == NULL ) + { + rc = -ESRCH; + goto fail; + } + + for ( i = 0; i < put_dma_buf.in.nr_extents; i++ ) + { + if ( unlikely(__copy_from_guest_offset( + &gpfn, put_dma_buf.in.extent_start, i, 1)) ) + { + rc = -EFAULT; + goto fail; + } + + rc = guest_physmap_unpin_range(d, gpfn, put_dma_buf.in.extent_order); + if ( rc ) + goto fail; + } + + rc = 0; + + fail: + rcu_unlock_domain(d); + return rc; +} + long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { struct domain *d; @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; + case XENMEM_get_dma_buf: + /* xen_get_dma_buf_t is identical to xen_memory_exchange_t, so + * just cast it and reuse memory_exchange */ case XENMEM_exchange: - rc = memory_exchange(guest_handle_cast(arg, xen_memory_exchange_t)); + rc = memory_exchange(op, guest_handle_cast(arg, xen_memory_exchange_t)); + break; + + case XENMEM_put_dma_buf: + rc = put_dma_buf(guest_handle_cast(arg, xen_put_dma_buf_t)); break; case XENMEM_maximum_ram_page: diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index 7a26dee..354b117 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -459,6 +459,70 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); * The zero value is appropiate. */ +#define XENMEM_get_dma_buf 26 +/* + * This hypercall is similar to XENMEM_exchange: it exchanges the pages + * passed in with a new set of pages, contiguous and under 4G if so + * requested. The new pages are going to be "pinned": it''s guaranteed + * that their p2m mapping won''t be changed until explicitly "unpinned". + * If return code is zero then @out.extent_list provides the MFNs of the + * newly-allocated memory. Returns zero on complete success, otherwise + * a negative error code. + * On complete success then always @nr_exchanged == @in.nr_extents. On + * partial success @nr_exchanged indicates how much work was done. + */ +struct xen_get_dma_buf { + /* + * [IN] Details of memory extents to be exchanged (GMFN bases). + * Note that @in.address_bits is ignored and unused. + */ + struct xen_memory_reservation in; + + /* + * [IN/OUT] Details of new memory extents. + * We require that: + * 1. @in.domid == @out.domid + * 2. @in.nr_extents << @in.extent_order == + * @out.nr_extents << @out.extent_order + * 3. @in.extent_start and @out.extent_start lists must not overlap + * 4. @out.extent_start lists GPFN bases to be populated + * 5. @out.extent_start is overwritten with allocated GMFN bases + */ + struct xen_memory_reservation out; + + /* + * [OUT] Number of input extents that were successfully exchanged: + * 1. The first @nr_exchanged input extents were successfully + * deallocated. + * 2. The corresponding first entries in the output extent list correctly + * indicate the GMFNs that were successfully exchanged. + * 3. All other input and output extents are untouched. + * 4. If not all input exents are exchanged then the return code of this + * command will be non-zero. + * 5. THIS FIELD MUST BE INITIALISED TO ZERO BY THE CALLER! + */ + xen_ulong_t nr_exchanged; +}; +typedef struct xen_get_dma_buf xen_get_dma_buf_t; +DEFINE_XEN_GUEST_HANDLE(xen_get_dma_buf_t); + +#define XENMEM_put_dma_buf 27 +/* + * XENMEM_put_dma_buf unpins a set of pages, previously pinned by + * XENMEM_get_dma_buf. After this call the p2m mapping of the pages can + * be transparently changed by the hypervisor, as usual. The pages are + * still accessible from the guest. + */ +struct xen_put_dma_buf { + /* + * [IN] Details of memory extents to be exchanged (GMFN bases). + * Note that @in.address_bits is ignored and unused. + */ + struct xen_memory_reservation in; +}; +typedef struct xen_put_dma_buf xen_put_dma_buf_t; +DEFINE_XEN_GUEST_HANDLE(xen_put_dma_buf_t); + #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */ #endif /* __XEN_PUBLIC_MEMORY_H__ */ -- 1.7.2.5
On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini wrote: Isn''t this identical to the x86 version? Can we make that common instead?> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > --- > xen/arch/arm/mm.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 43 insertions(+), 0 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index f301e65..ea64c03 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -751,6 +751,49 @@ int donate_page(struct domain *d, struct page_info *page, unsigned int memflags) > int steal_page( > struct domain *d, struct page_info *page, unsigned int memflags) > { > + unsigned long x, y; > + bool_t drop_dom_ref = 0; > + > + spin_lock(&d->page_alloc_lock); > + > + if ( is_xen_heap_page(page) || (page_get_owner(page) != d) ) > + goto fail; > + > + /* > + * We require there is just one reference (PGC_allocated). We temporarily > + * drop this reference now so that we can safely swizzle the owner. > + */ > + y = page->count_info; > + do { > + x = y; > + if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) ) > + goto fail; > + y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask); > + } while ( y != x ); > + > + /* Swizzle the owner then reinstate the PGC_allocated reference. */ > + page_set_owner(page, NULL); > + y = page->count_info; > + do { > + x = y; > + BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated); > + } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x ); > + > + /* Unlink from original owner. */ > + if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) ) > + drop_dom_ref = 1; > + page_list_del(page, &d->page_list); > + > + spin_unlock(&d->page_alloc_lock); > + if ( unlikely(drop_dom_ref) ) > + put_domain(d); > + return 0; > + > + fail: > + spin_unlock(&d->page_alloc_lock); > + printk("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%lx\n", > + (void *)page_to_mfn(page), d, d->domain_id, > + page_get_owner(page), page->count_info, page->u.inuse.type_info); > return -1; > } >
On Thu, 8 Aug 2013, Ian Campbell wrote:> On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini wrote: > > Isn''t this identical to the x86 version? Can we make that common > instead?Yes, it is identical. I can move it to common/memory.c.> > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > --- > > xen/arch/arm/mm.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > > 1 files changed, 43 insertions(+), 0 deletions(-) > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > > index f301e65..ea64c03 100644 > > --- a/xen/arch/arm/mm.c > > +++ b/xen/arch/arm/mm.c > > @@ -751,6 +751,49 @@ int donate_page(struct domain *d, struct page_info *page, unsigned int memflags) > > int steal_page( > > struct domain *d, struct page_info *page, unsigned int memflags) > > { > > + unsigned long x, y; > > + bool_t drop_dom_ref = 0; > > + > > + spin_lock(&d->page_alloc_lock); > > + > > + if ( is_xen_heap_page(page) || (page_get_owner(page) != d) ) > > + goto fail; > > + > > + /* > > + * We require there is just one reference (PGC_allocated). We temporarily > > + * drop this reference now so that we can safely swizzle the owner. > > + */ > > + y = page->count_info; > > + do { > > + x = y; > > + if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) ) > > + goto fail; > > + y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask); > > + } while ( y != x ); > > + > > + /* Swizzle the owner then reinstate the PGC_allocated reference. */ > > + page_set_owner(page, NULL); > > + y = page->count_info; > > + do { > > + x = y; > > + BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated); > > + } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x ); > > + > > + /* Unlink from original owner. */ > > + if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) ) > > + drop_dom_ref = 1; > > + page_list_del(page, &d->page_list); > > + > > + spin_unlock(&d->page_alloc_lock); > > + if ( unlikely(drop_dom_ref) ) > > + put_domain(d); > > + return 0; > > + > > + fail: > > + spin_unlock(&d->page_alloc_lock); > > + printk("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%lx\n", > > + (void *)page_to_mfn(page), d, d->domain_id, > > + page_get_owner(page), page->count_info, page->u.inuse.type_info); > > return -1; > > } > > > >
Ian Campbell
2013-Aug-08 10:40 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini wrote:> Introduce two new hypercalls XENMEM_get_dma_buf and XENMEM_put_dma_buf. > > XENMEM_get_dma_buf exchanges a set of pages for a new set, contiguous and > under 4G if so requested. The new pages are going to be "pinned" so that > their p2m mapping won''t be changed, until XENMEM_put_dma_buf is called. > XENMEM_get_dma_buf returns the MFNs of the new pages to the caller. > > The only effect of XENMEM_put_dma_buf is to "unpin" the previously > pinned pages. Afterwards the p2m mappings can be transparently changed by > the hypervisor as normal. The memory remains accessible from the guest. > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>This is a common code (and ABI) change so you should have CCd Keir. I''ve added him.> --- > xen/common/memory.c | 73 +++++++++++++++++++++++++++++++++++++++++-- > xen/include/public/memory.h | 64 +++++++++++++++++++++++++++++++++++++ > 2 files changed, 134 insertions(+), 3 deletions(-) > > diff --git a/xen/common/memory.c b/xen/common/memory.c > index 50b740f..3e1db2a 100644 > --- a/xen/common/memory.c > +++ b/xen/common/memory.c > @@ -279,7 +279,7 @@ static void decrease_reservation(struct memop_args *a) > a->nr_done = i; > } > > -static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > +static long memory_exchange(int op, XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > { > struct xen_memory_exchange exch; > PAGE_LIST_HEAD(in_chunk_list); > @@ -496,7 +496,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > mfn = page_to_mfn(page); > guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order); > > - if ( !paging_mode_translate(d) ) > + if ( op == XENMEM_get_dma_buf || !paging_mode_translate(d) ) > { > for ( k = 0; k < (1UL << exch.out.extent_order); k++ ) > set_gpfn_from_mfn(mfn + k, gpfn + k); > @@ -505,6 +505,18 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > &mfn, 1) ) > rc = -EFAULT; > } > + > + if ( op == XENMEM_get_dma_buf )I think this should be before the copy back of the mfn.> + { > + static int warning; > + if ( guest_physmap_pin_range(d, gpfn, exch.out.extent_order) ) > + { > + if (!warning) { > + gdprintk(XENLOG_WARNING, "guest_physmap_pin_range not implemented\n"); > + warning = 1; > + }Not all error codes are ENOSYS. Please just propagate the error code and fail the hypercall. No need to log IMHO.> @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) > > break; > > + case XENMEM_get_dma_buf: > + /* xen_get_dma_buf_t is identical to xen_memory_exchange_t, so > + * just cast it and reuse memory_exchange */ > case XENMEM_exchange: > - rc = memory_exchange(guest_handle_cast(arg, xen_memory_exchange_t)); > + rc = memory_exchange(op, guest_handle_cast(arg, xen_memory_exchange_t)); > + break; > + > + case XENMEM_put_dma_buf: > + rc = put_dma_buf(guest_handle_cast(arg, xen_put_dma_buf_t)); > break; > > case XENMEM_maximum_ram_page: > diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h > index 7a26dee..354b117 100644 > --- a/xen/include/public/memory.h > +++ b/xen/include/public/memory.h > @@ -459,6 +459,70 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); > * The zero value is appropiate. > */ > > +#define XENMEM_get_dma_buf 26 > +/* > + * This hypercall is similar to XENMEM_exchange:And it''s argument struct is identical. I think you should just reuse struct xen_memory_exchange (and update those comments to handle both cases). Then you get rid of the nasty cast which you had to add. (I''m not sure without looking it up if casting between two structs which happen to have identical declarations is valid C.)> it exchanges the pages > + * passed in with a new set of pages, contiguous and under 4G if so > + * requested.I guess it avoids exchanging the pages if they already meet this constraint? The "if so requested" binds to "under 4G" and not "contiguous"? The result is always contiguous I think. Also 4G is just an example and the caller can ask for any order it likes. The contents of the pages will be lost I think. Perhaps in debug builds we should consider actually nuking the page''s content in the case where we don''t actually exchange?> The new pages are going to be "pinned": it''s guaranteed > + * that their p2m mapping won''t be changed until explicitly "unpinned".We have a privilege level check here through the use of multipage_allocation_permitted, which permits up to 2MB mappings. But unlike the original exchange these can be cumulative. Do we need a per domain limit on the number of pinned p2m pages? My gut feeling is yes, otherwise a guest can pin 1 page in every 2MB range, which isn''t so bad now but will cause fragementation issues when we switch to 2MB p2m mappings.> + * If return code is zero then @out.extent_list provides the MFNs of the > + * newly-allocated memory. Returns zero on complete success, otherwise > + * a negative error code. > + * On complete success then always @nr_exchanged == @in.nr_extents. On > + * partial success @nr_exchanged indicates how much work was done. > + */ > +struct xen_get_dma_buf { > + /* > + * [IN] Details of memory extents to be exchanged (GMFN bases). > + * Note that @in.address_bits is ignored and unused. > + */ > + struct xen_memory_reservation in; > + > + /* > + * [IN/OUT] Details of new memory extents. > + * We require that: > + * 1. @in.domid == @out.domid > + * 2. @in.nr_extents << @in.extent_order == > + * @out.nr_extents << @out.extent_order > + * 3. @in.extent_start and @out.extent_start lists must not overlap > + * 4. @out.extent_start lists GPFN bases to be populated > + * 5. @out.extent_start is overwritten with allocated GMFN bases > + */ > + struct xen_memory_reservation out; > + > + /* > + * [OUT] Number of input extents that were successfully exchanged: > + * 1. The first @nr_exchanged input extents were successfully > + * deallocated. > + * 2. The corresponding first entries in the output extent list correctly > + * indicate the GMFNs that were successfully exchanged. > + * 3. All other input and output extents are untouched. > + * 4. If not all input exents are exchanged then the return code of this"extents"> + * command will be non-zero. > + * 5. THIS FIELD MUST BE INITIALISED TO ZERO BY THE CALLER! > + */ > + xen_ulong_t nr_exchanged; > +}; > +typedef struct xen_get_dma_buf xen_get_dma_buf_t; > +DEFINE_XEN_GUEST_HANDLE(xen_get_dma_buf_t); > + > +#define XENMEM_put_dma_buf 27 > +/* > + * XENMEM_put_dma_buf unpins a set of pages, previously pinned by > + * XENMEM_get_dma_buf. After this call the p2m mapping of the pages can > + * be transparently changed by the hypervisor, as usual. The pages are > + * still accessible from the guest. > + */ > +struct xen_put_dma_buf { > + /* > + * [IN] Details of memory extents to be exchanged (GMFN bases). > + * Note that @in.address_bits is ignored and unused. > + */ > + struct xen_memory_reservation in; > +}; > +typedef struct xen_put_dma_buf xen_put_dma_buf_t; > +DEFINE_XEN_GUEST_HANDLE(xen_put_dma_buf_t); > + > #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */ > > #endif /* __XEN_PUBLIC_MEMORY_H__ */
Ian Campbell
2013-Aug-08 10:52 UTC
Re: [PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range
On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini wrote:> guest_physmap_pin_range pins a range of guest pages so that they p2ms/they/their/> mappings won''t be changed. > guest_physmap_unpin_range unpins the previously pinned pages. > > The pinning is done using one of the spare bits in the p2m ptes. > > Provide empty stubs for x86. > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > --- > xen/arch/arm/p2m.c | 131 ++++++++++++++++++++++++++++++++++++++++++++ > xen/include/asm-arm/mm.h | 4 + > xen/include/asm-arm/page.h | 6 ++- > xen/include/asm-x86/p2m.h | 12 ++++ > 4 files changed, 152 insertions(+), 1 deletions(-) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index 307c6d4..7543ca8 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -75,6 +75,129 @@ done: > return maddr; > } > > + > +int guest_physmap_pin_range(struct domain *d, > + xen_pfn_t gpfn, > + unsigned int order) > +{ > + lpae_t *first = NULL, *second = NULL, *third = NULL, pte; > + paddr_t addr = gpfn << PAGE_SHIFT; > + struct p2m_domain *p2m = &d->arch.p2m; > + int rc = -EFAULT, i; > + unsigned long cur_first_offset = ~0, cur_second_offset = ~0; > + > + spin_lock(&p2m->lock); > + > + first = __map_domain_page(p2m->first_level); > + > + if ( !first || > + !first[first_table_offset(addr)].p2m.valid || > + !first[first_table_offset(addr)].p2m.table ) > + goto err; > + > + for ( i = 0; i < (1UL << order); i++ ) > + { > + if ( cur_first_offset != first_table_offset(addr) ) > + { > + if (second) unmap_domain_page(second); > + second = map_domain_page(first[first_table_offset(addr)].p2m.base); > + cur_first_offset = first_table_offset(addr); > + } > + if ( !second || > + !second[second_table_offset(addr)].p2m.valid || > + !second[second_table_offset(addr)].p2m.table ) > + goto err; > + > + if ( cur_second_offset != second_table_offset(addr) ) > + { > + if (third) unmap_domain_page(third); > + third = map_domain_page(second[second_table_offset(addr)].p2m.base); > + cur_second_offset = second_table_offset(addr); > + } > + if ( !third || > + !third[third_table_offset(addr)].p2m.valid || > + third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN )At this point rc == -EFAULT, which doesn''t seem right when failing the P2M_DMA_PIN check.> + goto err; > + > + pte = third[third_table_offset(addr)]; > + pte.p2m.avail |= P2M_DMA_PIN; > + write_pte(&third[third_table_offset(addr)], pte); > + > + addr += PAGE_SIZE; > + } > + > + rc = 0; > + > +err:Leaks final mapping of third I think? You add two p2m walkers in this patch, and we have at least one already. Perhaps it is time for a generic function which calls a callback on the leaf ptes?> + if ( second ) unmap_domain_page(second); > + if ( first ) unmap_domain_page(first); > + > + spin_unlock(&p2m->lock); > + > + return rc; > +} > + > +int guest_physmap_unpin_range(struct domain *d, > + xen_pfn_t gpfn, > + unsigned int order) > +{ > + lpae_t *first = NULL, *second = NULL, *third = NULL, pte; > + paddr_t addr = gpfn << PAGE_SHIFT; > + struct p2m_domain *p2m = &d->arch.p2m; > + int rc = -EFAULT, i; > + unsigned long cur_first_offset = ~0, cur_second_offset = ~0; > [...]Same generic comments as the pin version.> + rc = 0; > + > +err: > + if ( second ) unmap_domain_page(second); > + if ( first ) unmap_domain_page(first); > + > + spin_unlock(&p2m->lock); > + > + return rc; > +} > + > int guest_physmap_mark_populate_on_demand(struct domain *d, > unsigned long gfn, > unsigned int order) > @@ -184,6 +307,14 @@ static int create_p2m_entries(struct domain *d, > cur_second_offset = second_table_offset(addr); > } > > + if ( third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN ) > + { > + rc = -EINVAL; > + printk("%s: cannot change p2m mapping for paddr=%"PRIpaddr > + " domid=%d, the page is pinned\n", __func__, addr, d->domain_id);Not sure if a guest can trigger this, if yes then gdprintk. Do we always clear this bit when we clear the present bit? If not then we could end up with entries which are pinned and not-present. We should probably avoid that one way or another!> + goto out; > + } > + > flush = third[third_table_offset(addr)].p2m.valid; > > /* Allocate a new RAM page and attach */ > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h > index 5e7c5a3..8db8816 100644 > --- a/xen/include/asm-arm/mm.h > +++ b/xen/include/asm-arm/mm.h > @@ -319,6 +319,10 @@ void free_init_memory(void); > > int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, > unsigned int order); > +int guest_physmap_pin_range(struct domain *d, xen_pfn_t gpfn, > + unsigned int order); > +int guest_physmap_unpin_range(struct domain *d, xen_pfn_t gpfn, > + unsigned int order); > > extern void put_page_type(struct page_info *page); > static inline void put_page_and_type(struct page_info *page) > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > index 41e9eff..12087d0 100644 > --- a/xen/include/asm-arm/page.h > +++ b/xen/include/asm-arm/page.h > @@ -153,11 +153,15 @@ typedef struct { > unsigned long hint:1; /* In a block of 16 contiguous entries */ > unsigned long sbz2:1; > unsigned long xn:1; /* eXecute-Never */ > - unsigned long avail:4; /* Ignored by hardware */ > + unsigned long avail:4; /* Ignored by hardware, see below */ > > unsigned long sbz1:5; > } __attribute__((__packed__)) lpae_p2m_t; > > +/* Xen "avail" bits allocation in third level entries */ > +#define P2M_DMA_PIN (1<<0) /* The page has been "pinned": the hypervisor > + promises not to change the p2m mapping. */I guess we are going to want to keep type info here eventually, this leaves 3 bits for that. For reference x86 has the following p2m types : p2m_ram_rw = 0, /* Normal read/write guest RAM */ p2m_invalid = 1, /* Nothing mapped here */ p2m_ram_logdirty = 2, /* Temporarily read-only for log-dirty */ p2m_ram_ro = 3, /* Read-only; writes are silently dropped */ p2m_mmio_dm = 4, /* Reads and write go to the device model */ p2m_mmio_direct = 5, /* Read/write mapping of genuine MMIO area */ p2m_populate_on_demand = 6, /* Place-holder for empty memory */ /* Although these are defined in all builds, they can only * be used in 64-bit builds */ p2m_grant_map_rw = 7, /* Read/write grant mapping */ p2m_grant_map_ro = 8, /* Read-only grant mapping */ p2m_ram_paging_out = 9, /* Memory that is being paged out */ p2m_ram_paged = 10, /* Memory that has been paged out */ p2m_ram_paging_in = 11, /* Memory that is being paged in */ p2m_ram_shared = 12, /* Shared or sharable memory */ p2m_ram_broken = 13, /* Broken page, access cause domain crash */ We can get away with not having some of these for now, but not necessarily in the long term I think. 4 available bits is really too few :-( I''m unsure if pinning is orthogonal to type. I think it probably is? Or can we say that only normal ram can be pinned? Actually that may make sense since many of the other types are either implicitly pinned already (grant map, mmio etc) or pinning makes no sense (broken) or pinning would have to change the type anyway (pin shared -> unshare -> ram_rw +pinned, likewise for the paging types). So the pinable types are p2m_ram_rw, p2m_ram_ro, p2m_ram_logdirty? We could perhaps declare that ro and logdirty are implicitly pinned too? For log dirty it''s pretty obvious, not so sure about ro -- doing DMA even just from r/o memory seems a bit dodgy to allow. Ian.
Jan Beulich
2013-Aug-08 11:43 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
>>> On 05.08.13 at 18:40, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > @@ -496,7 +496,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > mfn = page_to_mfn(page); > guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order); > > - if ( !paging_mode_translate(d) ) > + if ( op == XENMEM_get_dma_buf || !paging_mode_translate(d) )This is bogus: Whether the translation tables need updating here shouldn''t depend on the operation.> @@ -505,6 +505,18 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > &mfn, 1) ) > rc = -EFAULT; > } > + > + if ( op == XENMEM_get_dma_buf ) > + { > + static int warning;bool_t. Missing blank line. And this should go into the next scope, or the two if-s should be combined.> + if ( guest_physmap_pin_range(d, gpfn, exch.out.extent_order) ) > + { > + if (!warning) {Coding style.> + gdprintk(XENLOG_WARNING, "guest_physmap_pin_range not implemented\n");Is "not implemented" really the only failure mode?> +static long put_dma_buf(XEN_GUEST_HANDLE_PARAM(xen_put_dma_buf_t) arg) > +{ > + int rc, i;Please no signed variables for loops having an unsigned upper bound. Even more, the bound is xen_ulong_t, i.e. the loop below may end up running forever on 64-bit at least.> @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) > > break; > > + case XENMEM_get_dma_buf: > + /* xen_get_dma_buf_t is identical to xen_memory_exchange_t, so > + * just cast it and reuse memory_exchange */Then why do you define a new type for it in the first place? And anyway, the naming of the new sub-op isn''t really representing what the operation is capable of. XENMEM_exchange_and_pin would seem to come much closer.> case XENMEM_exchange: > - rc = memory_exchange(guest_handle_cast(arg, xen_memory_exchange_t)); > + rc = memory_exchange(op, guest_handle_cast(arg, xen_memory_exchange_t)); > + break; > + > + case XENMEM_put_dma_buf: > + rc = put_dma_buf(guest_handle_cast(arg, xen_put_dma_buf_t));And similarly this should be XENMEM_unpin or some such.> --- a/xen/include/public/memory.h > +++ b/xen/include/public/memory.h > @@ -459,6 +459,70 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); > * The zero value is appropiate. > */ > > +#define XENMEM_get_dma_buf 26 > +/* > + * This hypercall is similar to XENMEM_exchange: it exchanges the pages > + * passed in with a new set of pages, contiguous and under 4G if soThe mentioning of 4G here is pointless - the hypercall can do all sorts of address range restriction, so why mention this special case? Jan
Keir Fraser
2013-Aug-08 14:12 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On 08/08/2013 11:40, "Ian Campbell" <Ian.Campbell@citrix.com> wrote:>> it exchanges the pages >> + * passed in with a new set of pages, contiguous and under 4G if so >> + * requested. > > I guess it avoids exchanging the pages if they already meet this > constraint? > > The "if so requested" binds to "under 4G" and not "contiguous"? The > result is always contiguous I think. Also 4G is just an example and the > caller can ask for any order it likes. > > The contents of the pages will be lost I think. Perhaps in debug builds > we should consider actually nuking the page''s content in the case where > we don''t actually exchange?This confused me too, this talk of contiguity and being below 4GB. Doesn''t this new hypercall behave *exactly* like XENMEM_exchange, except for the additional pinning of the MFNs? If so, just say that in the patch header and code comments. The fact *you* currently use it to get memory below 4GB is an application detail. I wonder whether XENMEM_exchange is much use at all in the presence of paging, without the additional semantics of pinning? Perhaps XENMEM_exchange could be suitably adjusted to have the pinning semantics in this case? -- Keir
Stefano Stabellini
2013-Aug-08 14:12 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On Thu, 8 Aug 2013, Jan Beulich wrote:> >>> On 05.08.13 at 18:40, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > > @@ -496,7 +496,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > > mfn = page_to_mfn(page); > > guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order); > > > > - if ( !paging_mode_translate(d) ) > > + if ( op == XENMEM_get_dma_buf || !paging_mode_translate(d) ) > > This is bogus: Whether the translation tables need updating > here shouldn''t depend on the operation.This is key: I need an hypercall that writes back the mfn of the new buffer. The old hypercall doesn''t do it. This time I specifically wrote in a comment: * If return code is zero then @out.extent_list provides the MFNs of the * newly-allocated memory. Also it makes sense for the old hypercall not to return the mfn to autotranslate guests because the p2m mappings are not pinned, so it can''t guarantee that they won''t change.> > @@ -505,6 +505,18 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > > &mfn, 1) ) > > rc = -EFAULT; > > } > > + > > + if ( op == XENMEM_get_dma_buf ) > > + { > > + static int warning; > > bool_t. Missing blank line. And this should go into the next scope, > or the two if-s should be combined. > > > + if ( guest_physmap_pin_range(d, gpfn, exch.out.extent_order) ) > > + { > > + if (!warning) { > > Coding style. > > > + gdprintk(XENLOG_WARNING, "guest_physmap_pin_range not implemented\n"); > > Is "not implemented" really the only failure mode?My mistake. I''ll remove the warning and propagate the failure instead.> > +static long put_dma_buf(XEN_GUEST_HANDLE_PARAM(xen_put_dma_buf_t) arg) > > +{ > > + int rc, i; > > Please no signed variables for loops having an unsigned upper > bound. Even more, the bound is xen_ulong_t, i.e. the loop below > may end up running forever on 64-bit at least.Argh, thanks for catching this.> > @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) > > > > break; > > > > + case XENMEM_get_dma_buf: > > + /* xen_get_dma_buf_t is identical to xen_memory_exchange_t, so > > + * just cast it and reuse memory_exchange */ > > Then why do you define a new type for it in the first place?Because the original hypercall doesn''t copy back the mfn if the guest it''s an autotranslate guest. And because it doesn''t pin the p2m mappings.> And anyway, the naming of the new sub-op isn''t really > representing what the operation is capable of. > XENMEM_exchange_and_pin would seem to come much closer.I like this suggestion. I could also keep xen_memory_exchange as argument for XENMEM_exchange_and_pin, as Ian suggested.> > case XENMEM_exchange: > > - rc = memory_exchange(guest_handle_cast(arg, xen_memory_exchange_t)); > > + rc = memory_exchange(op, guest_handle_cast(arg, xen_memory_exchange_t)); > > + break; > > + > > + case XENMEM_put_dma_buf: > > + rc = put_dma_buf(guest_handle_cast(arg, xen_put_dma_buf_t)); > > And similarly this should be XENMEM_unpin or some such.OK.> > --- a/xen/include/public/memory.h > > +++ b/xen/include/public/memory.h > > @@ -459,6 +459,70 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t); > > * The zero value is appropiate. > > */ > > > > +#define XENMEM_get_dma_buf 26 > > +/* > > + * This hypercall is similar to XENMEM_exchange: it exchanges the pages > > + * passed in with a new set of pages, contiguous and under 4G if so > > The mentioning of 4G here is pointless - the hypercall can do all > sorts of address range restriction, so why mention this special > case?Good point.
Stefano Stabellini
2013-Aug-08 14:16 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On Thu, 8 Aug 2013, Keir Fraser wrote:> On 08/08/2013 11:40, "Ian Campbell" <Ian.Campbell@citrix.com> wrote: > > >> it exchanges the pages > >> + * passed in with a new set of pages, contiguous and under 4G if so > >> + * requested. > > > > I guess it avoids exchanging the pages if they already meet this > > constraint? > > > > The "if so requested" binds to "under 4G" and not "contiguous"? The > > result is always contiguous I think. Also 4G is just an example and the > > caller can ask for any order it likes. > > > > The contents of the pages will be lost I think. Perhaps in debug builds > > we should consider actually nuking the page''s content in the case where > > we don''t actually exchange? > > This confused me too, this talk of contiguity and being below 4GB. Doesn''t > this new hypercall behave *exactly* like XENMEM_exchange, except for the > additional pinning of the MFNs? If so, just say that in the patch header and > code comments. The fact *you* currently use it to get memory below 4GB is an > application detail.Yeah, you have a good point there. Trying to make the comment clearer I actually confused it. I''ll fix it.> I wonder whether XENMEM_exchange is much use at all in the presence of > paging, without the additional semantics of pinning? Perhaps XENMEM_exchange > could be suitably adjusted to have the pinning semantics in this case?It''s not just about pinning, we also need to get the mfn back even if the guest is autotranslate, that XENMEM_exchange doesn''t do.
Tim Deegan
2013-Aug-08 14:26 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
At 15:16 +0100 on 08 Aug (1375974960), Stefano Stabellini wrote:> On Thu, 8 Aug 2013, Keir Fraser wrote: > > On 08/08/2013 11:40, "Ian Campbell" <Ian.Campbell@citrix.com> wrote: > > > > >> it exchanges the pages > > >> + * passed in with a new set of pages, contiguous and under 4G if so > > >> + * requested. > > > > > > I guess it avoids exchanging the pages if they already meet this > > > constraint? > > > > > > The "if so requested" binds to "under 4G" and not "contiguous"? The > > > result is always contiguous I think. Also 4G is just an example and the > > > caller can ask for any order it likes. > > > > > > The contents of the pages will be lost I think. Perhaps in debug builds > > > we should consider actually nuking the page''s content in the case where > > > we don''t actually exchange? > > > > This confused me too, this talk of contiguity and being below 4GB. Doesn''t > > this new hypercall behave *exactly* like XENMEM_exchange, except for the > > additional pinning of the MFNs? If so, just say that in the patch header and > > code comments. The fact *you* currently use it to get memory below 4GB is an > > application detail. > > Yeah, you have a good point there. Trying to make the comment clearer I > actually confused it. I''ll fix it. > > > > I wonder whether XENMEM_exchange is much use at all in the presence of > > paging, without the additional semantics of pinning? Perhaps XENMEM_exchange > > could be suitably adjusted to have the pinning semantics in this case? > > It''s not just about pinning, we also need to get the mfn back even if > the guest is autotranslate, that XENMEM_exchange doesn''t do.I''d prefer if this were phrased as a DMA address, or something like that. On systems with an IOMMU (e.g. x86) this address might not be an MFN. Tim.
Ian Campbell
2013-Aug-08 14:27 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On Thu, 2013-08-08 at 15:16 +0100, Stefano Stabellini wrote:> On Thu, 8 Aug 2013, Keir Fraser wrote: > > On 08/08/2013 11:40, "Ian Campbell" <Ian.Campbell@citrix.com> wrote: > > > > >> it exchanges the pages > > >> + * passed in with a new set of pages, contiguous and under 4G if so > > >> + * requested. > > > > > > I guess it avoids exchanging the pages if they already meet this > > > constraint? > > > > > > The "if so requested" binds to "under 4G" and not "contiguous"? The > > > result is always contiguous I think. Also 4G is just an example and the > > > caller can ask for any order it likes. > > > > > > The contents of the pages will be lost I think. Perhaps in debug builds > > > we should consider actually nuking the page''s content in the case where > > > we don''t actually exchange? > > > > This confused me too, this talk of contiguity and being below 4GB. Doesn''t > > this new hypercall behave *exactly* like XENMEM_exchange, except for the > > additional pinning of the MFNs? If so, just say that in the patch header and > > code comments. The fact *you* currently use it to get memory below 4GB is an > > application detail. > > Yeah, you have a good point there. Trying to make the comment clearer I > actually confused it. I''ll fix it. > > > > I wonder whether XENMEM_exchange is much use at all in the presence of > > paging, without the additional semantics of pinning? Perhaps XENMEM_exchange > > could be suitably adjusted to have the pinning semantics in this case? > > It''s not just about pinning, we also need to get the mfn back even if > the guest is autotranslate, that XENMEM_exchange doesn''t do.Plus current users won''t be expecting to have to unpin when they are done, so just adding pinning to the existing function won''t work. the struct doesn''t have a flags field so we can''t do it that way either. I suppose we could deprecate the existing exchange in favour of one with a flags field, but that doesn''t seem any better than renaming this new addition XENMEM_exchange_and_pin. Ian.
Jan Beulich
2013-Aug-08 14:38 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
>>> On 08.08.13 at 16:12, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > On Thu, 8 Aug 2013, Jan Beulich wrote: >> >>> On 05.08.13 at 18:40, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: >> > @@ -496,7 +496,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) >> > mfn = page_to_mfn(page); >> > guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order); >> > >> > - if ( !paging_mode_translate(d) ) >> > + if ( op == XENMEM_get_dma_buf || !paging_mode_translate(d) ) >> >> This is bogus: Whether the translation tables need updating >> here shouldn''t depend on the operation. > > This is key: I need an hypercall that writes back the mfn of the new > buffer. The old hypercall doesn''t do it. This time I specifically wrote > in a comment: > > * If return code is zero then @out.extent_list provides the MFNs of the > * newly-allocated memory. > > Also it makes sense for the old hypercall not to return the mfn to > autotranslate guests because the p2m mappings are not pinned, so it > can''t guarantee that they won''t change.Oh, you''re talking about one half of the if() body, and I''m talking about the other (the part that was visible from the patch context): I agree that copying back the MFNs is useful/necessary here. But I don''t see why you''d want to call set_gpfn_from_mfn() - that should have been taken care of already. So perhaps split the one if into two?>> > @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd, > XEN_GUEST_HANDLE_PARAM(void) arg) >> > >> > break; >> > >> > + case XENMEM_get_dma_buf: >> > + /* xen_get_dma_buf_t is identical to xen_memory_exchange_t, so >> > + * just cast it and reuse memory_exchange */ >> >> Then why do you define a new type for it in the first place? > > Because the original hypercall doesn''t copy back the mfn if the guest > it''s an autotranslate guest. > And because it doesn''t pin the p2m mappings.Hmm, I''m confused: Here you reason why you want a new type (note that I said "type", not "sub-hypercall"), ...>> And anyway, the naming of the new sub-op isn''t really >> representing what the operation is capable of. >> XENMEM_exchange_and_pin would seem to come much closer. > > I like this suggestion. I could also keep xen_memory_exchange as > argument for XENMEM_exchange_and_pin, as Ian suggested.... yet here you agree that xen_memory_exchange could be re-used. Jan
Stefano Stabellini
2013-Aug-08 14:44 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On Thu, 8 Aug 2013, Jan Beulich wrote:> >>> On 08.08.13 at 16:12, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > > On Thu, 8 Aug 2013, Jan Beulich wrote: > >> >>> On 05.08.13 at 18:40, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > >> > @@ -496,7 +496,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg) > >> > mfn = page_to_mfn(page); > >> > guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order); > >> > > >> > - if ( !paging_mode_translate(d) ) > >> > + if ( op == XENMEM_get_dma_buf || !paging_mode_translate(d) ) > >> > >> This is bogus: Whether the translation tables need updating > >> here shouldn''t depend on the operation. > > > > This is key: I need an hypercall that writes back the mfn of the new > > buffer. The old hypercall doesn''t do it. This time I specifically wrote > > in a comment: > > > > * If return code is zero then @out.extent_list provides the MFNs of the > > * newly-allocated memory. > > > > Also it makes sense for the old hypercall not to return the mfn to > > autotranslate guests because the p2m mappings are not pinned, so it > > can''t guarantee that they won''t change. > > Oh, you''re talking about one half of the if() body, and I''m talking > about the other (the part that was visible from the patch context): > I agree that copying back the MFNs is useful/necessary here. But > I don''t see why you''d want to call set_gpfn_from_mfn() - that > should have been taken care of already. So perhaps split the one if > into two?Right, good suggestion.> >> > @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd, > > XEN_GUEST_HANDLE_PARAM(void) arg) > >> > > >> > break; > >> > > >> > + case XENMEM_get_dma_buf: > >> > + /* xen_get_dma_buf_t is identical to xen_memory_exchange_t, so > >> > + * just cast it and reuse memory_exchange */ > >> > >> Then why do you define a new type for it in the first place? > > > > Because the original hypercall doesn''t copy back the mfn if the guest > > it''s an autotranslate guest. > > And because it doesn''t pin the p2m mappings. > > Hmm, I''m confused: Here you reason why you want a new type > (note that I said "type", not "sub-hypercall"), ... > > >> And anyway, the naming of the new sub-op isn''t really > >> representing what the operation is capable of. > >> XENMEM_exchange_and_pin would seem to come much closer. > > > > I like this suggestion. I could also keep xen_memory_exchange as > > argument for XENMEM_exchange_and_pin, as Ian suggested. > > ... yet here you agree that xen_memory_exchange could be > re-used.I meant only to keep the same parameter struct xen_memory_exchange.
Keir Fraser
2013-Aug-08 14:47 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On 08/08/2013 15:16, "Stefano Stabellini" <stefano.stabellini@eu.citrix.com> wrote:>> This confused me too, this talk of contiguity and being below 4GB. Doesn''t >> this new hypercall behave *exactly* like XENMEM_exchange, except for the >> additional pinning of the MFNs? If so, just say that in the patch header and >> code comments. The fact *you* currently use it to get memory below 4GB is an >> application detail. > > Yeah, you have a good point there. Trying to make the comment clearer I > actually confused it. I''ll fix it. > > >> I wonder whether XENMEM_exchange is much use at all in the presence of >> paging, without the additional semantics of pinning? Perhaps XENMEM_exchange >> could be suitably adjusted to have the pinning semantics in this case? > > It''s not just about pinning, we also need to get the mfn back even if > the guest is autotranslate, that XENMEM_exchange doesn''t do.Ok, fine by me. -- Keir
Jan Beulich
2013-Aug-08 14:50 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
>>> On 08.08.13 at 16:44, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > On Thu, 8 Aug 2013, Jan Beulich wrote: >> >>> On 08.08.13 at 16:12, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: >> > On Thu, 8 Aug 2013, Jan Beulich wrote: >> >> >>> On 05.08.13 at 18:40, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: >> >> > @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd, >> > XEN_GUEST_HANDLE_PARAM(void) arg) >> >> > >> >> > break; >> >> > >> >> > + case XENMEM_get_dma_buf: >> >> > + /* xen_get_dma_buf_t is identical to xen_memory_exchange_t, so >> >> > + * just cast it and reuse memory_exchange */ >> >> >> >> Then why do you define a new type for it in the first place? >> > >> > Because the original hypercall doesn''t copy back the mfn if the guest >> > it''s an autotranslate guest. >> > And because it doesn''t pin the p2m mappings. >> >> Hmm, I''m confused: Here you reason why you want a new type >> (note that I said "type", not "sub-hypercall"), ... >> >> >> And anyway, the naming of the new sub-op isn''t really >> >> representing what the operation is capable of. >> >> XENMEM_exchange_and_pin would seem to come much closer. >> > >> > I like this suggestion. I could also keep xen_memory_exchange as >> > argument for XENMEM_exchange_and_pin, as Ian suggested. >> >> ... yet here you agree that xen_memory_exchange could be >> re-used. > > I meant only to keep the same parameter struct xen_memory_exchange.And that''s what I asked for. Jan
Stefano Stabellini
2013-Aug-08 14:54 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On Thu, 8 Aug 2013, Jan Beulich wrote:> >>> On 08.08.13 at 16:44, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > > On Thu, 8 Aug 2013, Jan Beulich wrote: > >> >>> On 08.08.13 at 16:12, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > >> > On Thu, 8 Aug 2013, Jan Beulich wrote: > >> >> >>> On 05.08.13 at 18:40, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > >> >> > @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd, > >> > XEN_GUEST_HANDLE_PARAM(void) arg) > >> >> > > >> >> > break; > >> >> > > >> >> > + case XENMEM_get_dma_buf: > >> >> > + /* xen_get_dma_buf_t is identical to xen_memory_exchange_t, so > >> >> > + * just cast it and reuse memory_exchange */ > >> >> > >> >> Then why do you define a new type for it in the first place? > >> > > >> > Because the original hypercall doesn''t copy back the mfn if the guest > >> > it''s an autotranslate guest. > >> > And because it doesn''t pin the p2m mappings. > >> > >> Hmm, I''m confused: Here you reason why you want a new type > >> (note that I said "type", not "sub-hypercall"), ... > >> > >> >> And anyway, the naming of the new sub-op isn''t really > >> >> representing what the operation is capable of. > >> >> XENMEM_exchange_and_pin would seem to come much closer. > >> > > >> > I like this suggestion. I could also keep xen_memory_exchange as > >> > argument for XENMEM_exchange_and_pin, as Ian suggested. > >> > >> ... yet here you agree that xen_memory_exchange could be > >> re-used. > > > > I meant only to keep the same parameter struct xen_memory_exchange. > > And that''s what I asked for.OK, I''ll try to be clearer: I think it''s reasonable to have a new hypercall, called XENMEM_exchange_and_pin, using the same struct parameter of XENMEM_exchange, called struct xen_memory_exchange. Actually Ian suggested this. Do you agree? If not, given that the paramters are actually the same, do you think I should duplicate and rename the struct as I was doing in this patch?
Jan Beulich
2013-Aug-08 14:58 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
>>> On 08.08.13 at 16:54, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > OK, I''ll try to be clearer: I think it''s reasonable to have a new > hypercall, called XENMEM_exchange_and_pin, using the same struct > parameter of XENMEM_exchange, called struct xen_memory_exchange. > Actually Ian suggested this. > > Do you agree?Yes.> If not, given that the paramters are actually the same, do you think I > should duplicate and rename the struct as I was doing in this patch?It was actually this duplication that I wanted to see removed. Jan
Stefano Stabellini
2013-Aug-08 15:04 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On Thu, 8 Aug 2013, Tim Deegan wrote:> At 15:16 +0100 on 08 Aug (1375974960), Stefano Stabellini wrote: > > On Thu, 8 Aug 2013, Keir Fraser wrote: > > > On 08/08/2013 11:40, "Ian Campbell" <Ian.Campbell@citrix.com> wrote: > > > > > > >> it exchanges the pages > > > >> + * passed in with a new set of pages, contiguous and under 4G if so > > > >> + * requested. > > > > > > > > I guess it avoids exchanging the pages if they already meet this > > > > constraint? > > > > > > > > The "if so requested" binds to "under 4G" and not "contiguous"? The > > > > result is always contiguous I think. Also 4G is just an example and the > > > > caller can ask for any order it likes. > > > > > > > > The contents of the pages will be lost I think. Perhaps in debug builds > > > > we should consider actually nuking the page''s content in the case where > > > > we don''t actually exchange? > > > > > > This confused me too, this talk of contiguity and being below 4GB. Doesn''t > > > this new hypercall behave *exactly* like XENMEM_exchange, except for the > > > additional pinning of the MFNs? If so, just say that in the patch header and > > > code comments. The fact *you* currently use it to get memory below 4GB is an > > > application detail. > > > > Yeah, you have a good point there. Trying to make the comment clearer I > > actually confused it. I''ll fix it. > > > > > > > I wonder whether XENMEM_exchange is much use at all in the presence of > > > paging, without the additional semantics of pinning? Perhaps XENMEM_exchange > > > could be suitably adjusted to have the pinning semantics in this case? > > > > It''s not just about pinning, we also need to get the mfn back even if > > the guest is autotranslate, that XENMEM_exchange doesn''t do. > > I''d prefer if this were phrased as a DMA address, or something like > that. On systems with an IOMMU (e.g. x86) this address might not be an > MFN.I can see why we would want to make that clear. I''ll write DMA address in the comment.
Ian Campbell
2013-Aug-08 15:25 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini wrote:> xen/include/public/memory.h | 64 +++++++++++++++++++++++++++++++++++++Since I''ve just applied 73f18583dd82 "xen: arm: document which hypercalls (and subops) are supported on ARM" please can you also add this new call there. Ian.
Konrad Rzeszutek Wilk
2013-Aug-09 15:46 UTC
Re: [PATCH v3 1/3] xen/arm: implement steal_page
On Mon, Aug 05, 2013 at 05:40:54PM +0100, Stefano Stabellini wrote:> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > --- > xen/arch/arm/mm.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 43 insertions(+), 0 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index f301e65..ea64c03 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -751,6 +751,49 @@ int donate_page(struct domain *d, struct page_info *page, unsigned int memflags) > int steal_page( > struct domain *d, struct page_info *page, unsigned int memflags) > { > + unsigned long x, y; > + bool_t drop_dom_ref = 0;Shouldn''t this be true or false?> + > + spin_lock(&d->page_alloc_lock); > + > + if ( is_xen_heap_page(page) || (page_get_owner(page) != d) ) > + goto fail; > + > + /* > + * We require there is just one reference (PGC_allocated). We temporarily > + * drop this reference now so that we can safely swizzle the owner. > + */ > + y = page->count_info; > + do { > + x = y; > + if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) ) > + goto fail; > + y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask); > + } while ( y != x ); > + > + /* Swizzle the owner then reinstate the PGC_allocated reference. */ > + page_set_owner(page, NULL); > + y = page->count_info; > + do { > + x = y; > + BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated); > + } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x ); > + > + /* Unlink from original owner. */ > + if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) ) > + drop_dom_ref = 1; > + page_list_del(page, &d->page_list); > + > + spin_unlock(&d->page_alloc_lock); > + if ( unlikely(drop_dom_ref) ) > + put_domain(d); > + return 0; > + > + fail: > + spin_unlock(&d->page_alloc_lock); > + printk("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%lx\n", > + (void *)page_to_mfn(page), d, d->domain_id, > + page_get_owner(page), page->count_info, page->u.inuse.type_info); > return -1; > } > > -- > 1.7.2.5 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Stefano Stabellini
2013-Aug-13 16:05 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On Thu, 8 Aug 2013, Ian Campbell wrote:> > The new pages are going to be "pinned": it''s guaranteed > > + * that their p2m mapping won''t be changed until explicitly "unpinned". > > We have a privilege level check here through the use of > multipage_allocation_permitted, which permits up to 2MB mappings. But > unlike the original exchange these can be cumulative. Do we need a per > domain limit on the number of pinned p2m pages? My gut feeling is yes, > otherwise a guest can pin 1 page in every 2MB range, which isn''t so bad > now but will cause fragementation issues when we switch to 2MB p2m > mappings.Wouldn''t it cause fragementation issues only to itself? As far as I can tell the only problem that pinned p2m pages can cause to the hypervisor is preventing Xen from swapping or sharing domain pages.
Ian Campbell
2013-Aug-13 22:07 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On Tue, 2013-08-13 at 17:05 +0100, Stefano Stabellini wrote:> On Thu, 8 Aug 2013, Ian Campbell wrote: > > > The new pages are going to be "pinned": it''s guaranteed > > > + * that their p2m mapping won''t be changed until explicitly "unpinned". > > > > We have a privilege level check here through the use of > > multipage_allocation_permitted, which permits up to 2MB mappings. But > > unlike the original exchange these can be cumulative. Do we need a per > > domain limit on the number of pinned p2m pages? My gut feeling is yes, > > otherwise a guest can pin 1 page in every 2MB range, which isn''t so bad > > now but will cause fragementation issues when we switch to 2MB p2m > > mappings. > > Wouldn''t it cause fragementation issues only to itself?What I meant is that if a guest pins one page out of the 512 in a 2MB mapping then the other 511 pages can''t easily/efficiently be used for other guests even though they are "free". I don''t know if it would be unreasonable to account the full 2MB range to the guest until it has free''d it all, perhaps with some sort of per-domain "free" list so that the guest can cause itself pain by reusing them. Needs some thought...> As far as I can tell the only problem that pinned p2m pages can cause to > the hypervisor is preventing Xen from swapping or sharing domain pages.You are right about swapping, but pinning would necessarily imply an unshare I think. Is there any point in a r/o pin? I suspect not. Ian.
On Fri, 9 Aug 2013, Konrad Rzeszutek Wilk wrote:> On Mon, Aug 05, 2013 at 05:40:54PM +0100, Stefano Stabellini wrote: > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > --- > > xen/arch/arm/mm.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > > 1 files changed, 43 insertions(+), 0 deletions(-) > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > > index f301e65..ea64c03 100644 > > --- a/xen/arch/arm/mm.c > > +++ b/xen/arch/arm/mm.c > > @@ -751,6 +751,49 @@ int donate_page(struct domain *d, struct page_info *page, unsigned int memflags) > > int steal_page( > > struct domain *d, struct page_info *page, unsigned int memflags) > > { > > + unsigned long x, y; > > + bool_t drop_dom_ref = 0; > > Shouldn''t this be true or false?actually in Xen we use 0 and 1 with bool_t
Stefano Stabellini
2013-Aug-14 16:08 UTC
Re: [PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range
On Thu, 8 Aug 2013, Ian Campbell wrote:> > +int guest_physmap_pin_range(struct domain *d, > > + xen_pfn_t gpfn, > > + unsigned int order) > > +{ > > + lpae_t *first = NULL, *second = NULL, *third = NULL, pte; > > + paddr_t addr = gpfn << PAGE_SHIFT; > > + struct p2m_domain *p2m = &d->arch.p2m; > > + int rc = -EFAULT, i; > > + unsigned long cur_first_offset = ~0, cur_second_offset = ~0; > > + > > + spin_lock(&p2m->lock); > > + > > + first = __map_domain_page(p2m->first_level); > > + > > + if ( !first || > > + !first[first_table_offset(addr)].p2m.valid || > > + !first[first_table_offset(addr)].p2m.table ) > > + goto err; > > + > > + for ( i = 0; i < (1UL << order); i++ ) > > + { > > + if ( cur_first_offset != first_table_offset(addr) ) > > + { > > + if (second) unmap_domain_page(second); > > + second = map_domain_page(first[first_table_offset(addr)].p2m.base); > > + cur_first_offset = first_table_offset(addr); > > + } > > + if ( !second || > > + !second[second_table_offset(addr)].p2m.valid || > > + !second[second_table_offset(addr)].p2m.table ) > > + goto err; > > + > > + if ( cur_second_offset != second_table_offset(addr) ) > > + { > > + if (third) unmap_domain_page(third); > > + third = map_domain_page(second[second_table_offset(addr)].p2m.base); > > + cur_second_offset = second_table_offset(addr); > > + } > > + if ( !third || > > + !third[third_table_offset(addr)].p2m.valid || > > + third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN ) > > At this point rc == -EFAULT, which doesn''t seem right when failing the > P2M_DMA_PIN check.I''ll go with -EINVAL. Do you have any better suggestions otherwise?> > + goto err; > > + > > + pte = third[third_table_offset(addr)]; > > + pte.p2m.avail |= P2M_DMA_PIN; > > + write_pte(&third[third_table_offset(addr)], pte); > > + > > + addr += PAGE_SIZE; > > + } > > + > > + rc = 0; > > + > > +err: > > Leaks final mapping of third I think? > > You add two p2m walkers in this patch, and we have at least one already. > Perhaps it is time for a generic function which calls a callback on the > leaf ptes?OK, good idea.> > + if ( second ) unmap_domain_page(second); > > + if ( first ) unmap_domain_page(first); > > + > > + spin_unlock(&p2m->lock); > > + > > + return rc; > > +} > > + > > +int guest_physmap_unpin_range(struct domain *d, > > + xen_pfn_t gpfn, > > + unsigned int order) > > +{ > > + lpae_t *first = NULL, *second = NULL, *third = NULL, pte; > > + paddr_t addr = gpfn << PAGE_SHIFT; > > + struct p2m_domain *p2m = &d->arch.p2m; > > + int rc = -EFAULT, i; > > + unsigned long cur_first_offset = ~0, cur_second_offset = ~0; > > [...] > > Same generic comments as the pin version. > > > + rc = 0; > > + > > +err: > > + if ( second ) unmap_domain_page(second); > > + if ( first ) unmap_domain_page(first); > > + > > + spin_unlock(&p2m->lock); > > + > > + return rc; > > +} > > + > > int guest_physmap_mark_populate_on_demand(struct domain *d, > > unsigned long gfn, > > unsigned int order) > > @@ -184,6 +307,14 @@ static int create_p2m_entries(struct domain *d, > > cur_second_offset = second_table_offset(addr); > > } > > > > + if ( third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN ) > > + { > > + rc = -EINVAL; > > + printk("%s: cannot change p2m mapping for paddr=%"PRIpaddr > > + " domid=%d, the page is pinned\n", __func__, addr, d->domain_id); > > Not sure if a guest can trigger this, if yes then gdprintk. > > Do we always clear this bit when we clear the present bit? If not then > we could end up with entries which are pinned and not-present. We should > probably avoid that one way or another!I went through the code: yes we do, because we always reset the entire pte entry rather than only the valid bit.> > + goto out; > > + } > > + > > flush = third[third_table_offset(addr)].p2m.valid; > > > > /* Allocate a new RAM page and attach */ > > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h > > index 5e7c5a3..8db8816 100644 > > --- a/xen/include/asm-arm/mm.h > > +++ b/xen/include/asm-arm/mm.h > > @@ -319,6 +319,10 @@ void free_init_memory(void); > > > > int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, > > unsigned int order); > > +int guest_physmap_pin_range(struct domain *d, xen_pfn_t gpfn, > > + unsigned int order); > > +int guest_physmap_unpin_range(struct domain *d, xen_pfn_t gpfn, > > + unsigned int order); > > > > extern void put_page_type(struct page_info *page); > > static inline void put_page_and_type(struct page_info *page) > > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > > index 41e9eff..12087d0 100644 > > --- a/xen/include/asm-arm/page.h > > +++ b/xen/include/asm-arm/page.h > > @@ -153,11 +153,15 @@ typedef struct { > > unsigned long hint:1; /* In a block of 16 contiguous entries */ > > unsigned long sbz2:1; > > unsigned long xn:1; /* eXecute-Never */ > > - unsigned long avail:4; /* Ignored by hardware */ > > + unsigned long avail:4; /* Ignored by hardware, see below */ > > > > unsigned long sbz1:5; > > } __attribute__((__packed__)) lpae_p2m_t; > > > > +/* Xen "avail" bits allocation in third level entries */ > > +#define P2M_DMA_PIN (1<<0) /* The page has been "pinned": the hypervisor > > + promises not to change the p2m mapping. */ > > I guess we are going to want to keep type info here eventually, this > leaves 3 bits for that. > > For reference x86 has the following p2m types : > p2m_ram_rw = 0, /* Normal read/write guest RAM */ > p2m_invalid = 1, /* Nothing mapped here */ > p2m_ram_logdirty = 2, /* Temporarily read-only for log-dirty */ > p2m_ram_ro = 3, /* Read-only; writes are silently dropped */ > p2m_mmio_dm = 4, /* Reads and write go to the device model */ > p2m_mmio_direct = 5, /* Read/write mapping of genuine MMIO area */ > p2m_populate_on_demand = 6, /* Place-holder for empty memory */ > > /* Although these are defined in all builds, they can only > * be used in 64-bit builds */ > p2m_grant_map_rw = 7, /* Read/write grant mapping */ > p2m_grant_map_ro = 8, /* Read-only grant mapping */ > p2m_ram_paging_out = 9, /* Memory that is being paged out */ > p2m_ram_paged = 10, /* Memory that has been paged out */ > p2m_ram_paging_in = 11, /* Memory that is being paged in */ > p2m_ram_shared = 12, /* Shared or sharable memory */ > p2m_ram_broken = 13, /* Broken page, access cause domain crash */ > > We can get away with not having some of these for now, but not > necessarily in the long term I think. 4 available bits is really too > few :-( > > I''m unsure if pinning is orthogonal to type. I think it probably is? Or > can we say that only normal ram can be pinned? Actually that may make > sense since many of the other types are either implicitly pinned already > (grant map, mmio etc) or pinning makes no sense (broken) or pinning > would have to change the type anyway (pin shared -> unshare -> ram_rw > +pinned, likewise for the paging types). > > So the pinable types are p2m_ram_rw, p2m_ram_ro, p2m_ram_logdirty? We > could perhaps declare that ro and logdirty are implicitly pinned too? > For log dirty it''s pretty obvious, not so sure about ro -- doing DMA > even just from r/o memory seems a bit dodgy to allow.I agree. I think that only p2m_ram_rw should be pin-able. I can add that to the comment.
Stefano Stabellini
2013-Aug-14 16:17 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On Thu, 8 Aug 2013, Ian Campbell wrote:> On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini wrote: > > xen/include/public/memory.h | 64 +++++++++++++++++++++++++++++++++++++ > > Since I''ve just applied 73f18583dd82 "xen: arm: document which > hypercalls (and subops) are supported on ARM" please can you also add > this new call there.Actually you have already written: * HYPERVISOR_memory_op * All generic sub-operations. XENMEM_exchange_and_pin and XENMEM_unpin are generic subops.
Ian Campbell
2013-Aug-15 14:32 UTC
Re: [PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range
On Wed, 2013-08-14 at 17:08 +0100, Stefano Stabellini wrote:> On Thu, 8 Aug 2013, Ian Campbell wrote:> > > + if ( !third || > > > + !third[third_table_offset(addr)].p2m.valid || > > > + third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN ) > > > > At this point rc == -EFAULT, which doesn''t seem right when failing the > > P2M_DMA_PIN check. > > I''ll go with -EINVAL. Do you have any better suggestions otherwise?-EBUSY sort of fits here I guess?> > > + if ( third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN ) > > > + { > > > + rc = -EINVAL; > > > + printk("%s: cannot change p2m mapping for paddr=%"PRIpaddr > > > + " domid=%d, the page is pinned\n", __func__, addr, d->domain_id); > > > > Not sure if a guest can trigger this, if yes then gdprintk. > > > > Do we always clear this bit when we clear the present bit? If not then > > we could end up with entries which are pinned and not-present. We should > > probably avoid that one way or another! > > I went through the code: yes we do, because we always reset the entire > pte entry rather than only the valid bit.Good! Ian.
Ian Campbell
2013-Aug-15 14:32 UTC
Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf
On Wed, 2013-08-14 at 17:17 +0100, Stefano Stabellini wrote:> On Thu, 8 Aug 2013, Ian Campbell wrote: > > On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini wrote: > > > xen/include/public/memory.h | 64 +++++++++++++++++++++++++++++++++++++ > > > > Since I''ve just applied 73f18583dd82 "xen: arm: document which > > hypercalls (and subops) are supported on ARM" please can you also add > > this new call there. > > Actually you have already written: > > * HYPERVISOR_memory_op > * All generic sub-operations. > > XENMEM_exchange_and_pin and XENMEM_unpin are generic subops.Perfect ;-)