thr3ads.net - Xen devel - [PATCH v3 0/3] introduce XENMEM_get_dma_buf and XENMEM_put_dma

If this information is useful, please help other people find it:
Share via:

Stefano Stabellini

2013-Aug-05 16:40 UTC

[PATCH v3 0/3] introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Hi all,
this patch series introduces two new hypercalls to allow autotranslate
guests to allocate a contiguous buffer in machine addresses.
The XENMEM_get_dma_buf returns the mfns and makes sure to pin the pages
so that the hypervisor won''t change their p2m mappings while in use.
XENMEM_put_dma_buf simply unpins the pages.

Cheers,

Stefano


Changes in v2:
- actually don''t print the warning more than once.

Changes in v3:
- implement guest_physmap_pin_range and guest_physmap_unpin_range on
  ARM;
- implement XENMEM_put_dma_buf.



Stefano Stabellini (3):
      xen/arm: implement steal_page
      xen: implement guest_physmap_(un)pin_range
      xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

 xen/arch/arm/mm.c           |   43 ++++++++++++++
 xen/arch/arm/p2m.c          |  131 +++++++++++++++++++++++++++++++++++++++++++
 xen/common/memory.c         |   73 +++++++++++++++++++++++-
 xen/include/asm-arm/mm.h    |    4 +
 xen/include/asm-arm/page.h  |    6 ++-
 xen/include/asm-x86/p2m.h   |   12 ++++
 xen/include/public/memory.h |   64 +++++++++++++++++++++
 7 files changed, 329 insertions(+), 4 deletions(-)

git://xenbits.xen.org/people/sstabellini/xen-unstable.git get_dma_buf_3

Stefano Stabellini

2013-Aug-05 16:40 UTC

head link

[PATCH v3 1/3] xen/arm: implement steal_page

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 xen/arch/arm/mm.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index f301e65..ea64c03 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -751,6 +751,49 @@ int donate_page(struct domain *d, struct page_info *page,
unsigned int memflags)
 int steal_page(
     struct domain *d, struct page_info *page, unsigned int memflags)
 {
+    unsigned long x, y;
+    bool_t drop_dom_ref = 0;
+
+    spin_lock(&d->page_alloc_lock);
+
+    if ( is_xen_heap_page(page) || (page_get_owner(page) != d) )
+        goto fail;
+
+    /*
+     * We require there is just one reference (PGC_allocated). We temporarily
+     * drop this reference now so that we can safely swizzle the owner.
+     */
+    y = page->count_info;
+    do {
+        x = y;
+        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
+            goto fail;
+        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
+    } while ( y != x );
+
+    /* Swizzle the owner then reinstate the PGC_allocated reference. */
+    page_set_owner(page, NULL);
+    y = page->count_info;
+    do {
+        x = y;
+        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
+    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
+
+    /* Unlink from original owner. */
+    if ( !(memflags & MEMF_no_refcount) &&
!domain_adjust_tot_pages(d, -1) )
+        drop_dom_ref = 1;
+    page_list_del(page, &d->page_list);
+
+    spin_unlock(&d->page_alloc_lock);
+    if ( unlikely(drop_dom_ref) )
+        put_domain(d);
+    return 0;
+
+ fail:
+    spin_unlock(&d->page_alloc_lock);
+    printk("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%lx\n",
+            (void *)page_to_mfn(page), d, d->domain_id,
+            page_get_owner(page), page->count_info,
page->u.inuse.type_info);
     return -1;
 }
 
-- 
1.7.2.5

Stefano Stabellini

2013-Aug-05 16:40 UTC

head link

[PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range

guest_physmap_pin_range pins a range of guest pages so that they p2m
mappings won''t be changed.
guest_physmap_unpin_range unpins the previously pinned pages.

The pinning is done using one of the spare bits in the p2m ptes.

Provide empty stubs for x86.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 xen/arch/arm/p2m.c         |  131 ++++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/mm.h   |    4 +
 xen/include/asm-arm/page.h |    6 ++-
 xen/include/asm-x86/p2m.h  |   12 ++++
 4 files changed, 152 insertions(+), 1 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 307c6d4..7543ca8 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -75,6 +75,129 @@ done:
     return maddr;
 }
 
+
+int guest_physmap_pin_range(struct domain *d,
+                            xen_pfn_t gpfn,
+                            unsigned int order)
+{
+    lpae_t *first = NULL, *second = NULL, *third = NULL, pte;
+    paddr_t addr = gpfn << PAGE_SHIFT;
+    struct p2m_domain *p2m = &d->arch.p2m;
+    int rc = -EFAULT, i;
+    unsigned long cur_first_offset = ~0, cur_second_offset = ~0;
+
+    spin_lock(&p2m->lock);
+
+    first = __map_domain_page(p2m->first_level);
+
+    if ( !first ||
+            !first[first_table_offset(addr)].p2m.valid ||
+            !first[first_table_offset(addr)].p2m.table )
+        goto err;
+
+    for ( i = 0; i < (1UL << order); i++ )
+    {
+        if ( cur_first_offset != first_table_offset(addr) )
+        {
+            if (second) unmap_domain_page(second);
+            second = map_domain_page(first[first_table_offset(addr)].p2m.base);
+            cur_first_offset = first_table_offset(addr);
+        }
+        if ( !second ||
+                !second[second_table_offset(addr)].p2m.valid ||
+                !second[second_table_offset(addr)].p2m.table )
+            goto err;
+
+        if ( cur_second_offset != second_table_offset(addr) )
+        {
+            if (third) unmap_domain_page(third);
+            third =
map_domain_page(second[second_table_offset(addr)].p2m.base);
+            cur_second_offset = second_table_offset(addr);
+        }
+        if ( !third ||
+                !third[third_table_offset(addr)].p2m.valid ||
+                third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN )
+            goto err;
+
+        pte = third[third_table_offset(addr)];
+        pte.p2m.avail |= P2M_DMA_PIN;
+        write_pte(&third[third_table_offset(addr)], pte);
+
+        addr += PAGE_SIZE;
+    }
+
+    rc = 0;
+
+err:
+    if ( second ) unmap_domain_page(second);
+    if ( first ) unmap_domain_page(first);
+
+    spin_unlock(&p2m->lock);
+
+    return rc;
+}
+
+int guest_physmap_unpin_range(struct domain *d,
+                              xen_pfn_t gpfn,
+                              unsigned int order)
+{
+    lpae_t *first = NULL, *second = NULL, *third = NULL, pte;
+    paddr_t addr = gpfn << PAGE_SHIFT;
+    struct p2m_domain *p2m = &d->arch.p2m;
+    int rc = -EFAULT, i;
+    unsigned long cur_first_offset = ~0, cur_second_offset = ~0;
+
+    spin_lock(&p2m->lock);
+
+    first = __map_domain_page(p2m->first_level);
+
+    if ( !first ||
+            !first[first_table_offset(addr)].p2m.valid ||
+            !first[first_table_offset(addr)].p2m.table )
+        goto err;
+
+    for ( i = 0; i < (1UL << order); i++ )
+    {
+        if ( cur_first_offset != first_table_offset(addr) )
+        {
+            if (second) unmap_domain_page(second);
+            second = map_domain_page(first[first_table_offset(addr)].p2m.base);
+            cur_first_offset = first_table_offset(addr);
+        }
+        if ( !second ||
+                !second[second_table_offset(addr)].p2m.valid ||
+                !second[second_table_offset(addr)].p2m.table )
+            goto err;
+
+        if ( cur_second_offset != second_table_offset(addr) )
+        {
+            if (third) unmap_domain_page(third);
+            third =
map_domain_page(second[second_table_offset(addr)].p2m.base);
+            cur_second_offset = second_table_offset(addr);
+        }
+        if ( !third ||
+                !third[third_table_offset(addr)].p2m.valid ||
+                !(third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN)
)
+            goto err;
+
+        pte = third[third_table_offset(addr)];
+        pte.p2m.avail &= ~P2M_DMA_PIN;
+        write_pte(&third[third_table_offset(addr)], pte);
+
+        addr += PAGE_SIZE;
+    }
+
+    rc = 0;
+
+err:
+    if ( second ) unmap_domain_page(second);
+    if ( first ) unmap_domain_page(first);
+
+    spin_unlock(&p2m->lock);
+
+    return rc;
+}
+
 int guest_physmap_mark_populate_on_demand(struct domain *d,
                                           unsigned long gfn,
                                           unsigned int order)
@@ -184,6 +307,14 @@ static int create_p2m_entries(struct domain *d,
             cur_second_offset = second_table_offset(addr);
         }
 
+        if ( third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN )
+        {
+            rc = -EINVAL;
+            printk("%s: cannot change p2m mapping for
paddr=%"PRIpaddr
+                    " domid=%d, the page is pinned\n", __func__,
addr, d->domain_id);
+            goto out;
+        }
+
         flush = third[third_table_offset(addr)].p2m.valid;
 
         /* Allocate a new RAM page and attach */
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 5e7c5a3..8db8816 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -319,6 +319,10 @@ void free_init_memory(void);
 
 int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
                                           unsigned int order);
+int guest_physmap_pin_range(struct domain *d, xen_pfn_t gpfn,
+                            unsigned int order);
+int guest_physmap_unpin_range(struct domain *d, xen_pfn_t gpfn,
+                              unsigned int order);
 
 extern void put_page_type(struct page_info *page);
 static inline void put_page_and_type(struct page_info *page)
diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index 41e9eff..12087d0 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -153,11 +153,15 @@ typedef struct {
     unsigned long hint:1;       /* In a block of 16 contiguous entries */
     unsigned long sbz2:1;
     unsigned long xn:1;         /* eXecute-Never */
-    unsigned long avail:4;      /* Ignored by hardware */
+    unsigned long avail:4;      /* Ignored by hardware, see below */
 
     unsigned long sbz1:5;
 } __attribute__((__packed__)) lpae_p2m_t;
 
+/* Xen "avail" bits allocation in third level entries */
+#define P2M_DMA_PIN     (1<<0)  /* The page has been "pinned":
the hypervisor
+                                   promises not to change the p2m mapping. */
+
 /*
  * Walk is the common bits of p2m and pt entries which are needed to
  * simply walk the table (e.g. for debug).
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 43583b2..afc7738 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -492,6 +492,18 @@ void guest_physmap_remove_page(struct domain *d,
 /* Set a p2m range as populate-on-demand */
 int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
                                           unsigned int order);
+static inline int guest_physmap_pin_range(struct domain *d,
+                                          xen_pfn_t gpfn,
+                                          unsigned int order)
+{
+    return -ENOSYS;
+}
+int guest_physmap_unpin_range(struct domain *d,
+                              xen_pfn_t gpfn,
+                              unsigned int order)
+{
+    return -ENOSYS;
+}
 
 /* Change types across all p2m entries in a domain */
 void p2m_change_entry_type_global(struct domain *d, 
-- 
1.7.2.5

Stefano Stabellini

2013-Aug-05 16:40 UTC

head link

[PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Introduce two new hypercalls XENMEM_get_dma_buf and XENMEM_put_dma_buf.

XENMEM_get_dma_buf exchanges a set of pages for a new set, contiguous and
under 4G if so requested. The new pages are going to be "pinned" so
that
their p2m mapping won''t be changed, until XENMEM_put_dma_buf is called.
XENMEM_get_dma_buf returns the MFNs of the new pages to the caller.

The only effect of XENMEM_put_dma_buf is to "unpin" the previously
pinned pages. Afterwards the p2m mappings can be transparently changed by
the hypervisor as normal. The memory remains accessible from the guest.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 xen/common/memory.c         |   73 +++++++++++++++++++++++++++++++++++++++++--
 xen/include/public/memory.h |   64 +++++++++++++++++++++++++++++++++++++
 2 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 50b740f..3e1db2a 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -279,7 +279,7 @@ static void decrease_reservation(struct memop_args *a)
     a->nr_done = i;
 }
 
-static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
+static long memory_exchange(int op,
XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
 {
     struct xen_memory_exchange exch;
     PAGE_LIST_HEAD(in_chunk_list);
@@ -496,7 +496,7 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
             mfn = page_to_mfn(page);
             guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order);
 
-            if ( !paging_mode_translate(d) )
+            if ( op == XENMEM_get_dma_buf || !paging_mode_translate(d) )
             {
                 for ( k = 0; k < (1UL << exch.out.extent_order); k++ )
                     set_gpfn_from_mfn(mfn + k, gpfn + k);
@@ -505,6 +505,18 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
                                             &mfn, 1) )
                     rc = -EFAULT;
             }
+
+            if ( op == XENMEM_get_dma_buf )
+            {
+                static int warning;
+                if ( guest_physmap_pin_range(d, gpfn, exch.out.extent_order) )
+                {
+                    if (!warning) {
+                        gdprintk(XENLOG_WARNING, "guest_physmap_pin_range
not implemented\n");
+                        warning = 1;
+                    }
+                }
+            }
         }
         BUG_ON( !(d->is_dying) && (j != (1UL <<
out_chunk_order)) );
     }
@@ -542,6 +554,54 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
     return rc;
 }
 
+static long put_dma_buf(XEN_GUEST_HANDLE_PARAM(xen_put_dma_buf_t) arg)
+{
+    int rc, i;
+    struct xen_put_dma_buf put_dma_buf;
+    xen_pfn_t gpfn;
+    struct domain *d;
+
+    if ( copy_from_guest(&put_dma_buf, arg, 1) )
+        return -EFAULT;
+
+    /* Various sanity checks. */
+    if ( /* Extent orders are sensible? */
+         (put_dma_buf.in.extent_order > MAX_ORDER) ||
+         /* Sizes of input list do not overflow a long? */
+         ((~0UL >> put_dma_buf.in.extent_order) <
put_dma_buf.in.nr_extents) )
+        return -EFAULT;
+
+    if ( !guest_handle_okay(put_dma_buf.in.extent_start,
put_dma_buf.in.nr_extents) )
+        return -EFAULT;
+
+    d = rcu_lock_domain_by_any_id(put_dma_buf.in.domid);
+    if ( d == NULL )
+    {
+        rc = -ESRCH;
+        goto fail;
+    }
+
+    for ( i = 0; i < put_dma_buf.in.nr_extents; i++ )
+    {
+        if ( unlikely(__copy_from_guest_offset(
+                      &gpfn, put_dma_buf.in.extent_start, i, 1)) )
+        {
+            rc = -EFAULT;
+            goto fail;
+        }
+
+        rc = guest_physmap_unpin_range(d, gpfn, put_dma_buf.in.extent_order);
+        if ( rc )
+            goto fail;
+    }
+
+    rc = 0;
+
+ fail:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d;
@@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
 
         break;
 
+    case XENMEM_get_dma_buf:
+        /* xen_get_dma_buf_t is identical to xen_memory_exchange_t, so
+         * just cast it and reuse memory_exchange */
     case XENMEM_exchange:
-        rc = memory_exchange(guest_handle_cast(arg, xen_memory_exchange_t));
+        rc = memory_exchange(op, guest_handle_cast(arg,
xen_memory_exchange_t));
+        break;
+
+    case XENMEM_put_dma_buf:
+        rc = put_dma_buf(guest_handle_cast(arg, xen_put_dma_buf_t));
         break;
 
     case XENMEM_maximum_ram_page:
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 7a26dee..354b117 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -459,6 +459,70 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
  * The zero value is appropiate.
  */
 
+#define XENMEM_get_dma_buf             26
+/*
+ * This hypercall is similar to XENMEM_exchange: it exchanges the pages
+ * passed in with a new set of pages, contiguous and under 4G if so
+ * requested. The new pages are going to be "pinned": it''s
guaranteed
+ * that their p2m mapping won''t be changed until explicitly
"unpinned".
+ * If return code is zero then @out.extent_list provides the MFNs of the
+ * newly-allocated memory.  Returns zero on complete success, otherwise
+ * a negative error code.
+ * On complete success then always @nr_exchanged == @in.nr_extents.  On
+ * partial success @nr_exchanged indicates how much work was done.
+ */
+struct xen_get_dma_buf {
+    /*
+     * [IN] Details of memory extents to be exchanged (GMFN bases).
+     * Note that @in.address_bits is ignored and unused.
+     */
+    struct xen_memory_reservation in;
+
+    /*
+     * [IN/OUT] Details of new memory extents.
+     * We require that:
+     *  1. @in.domid == @out.domid
+     *  2. @in.nr_extents  << @in.extent_order == 
+     *     @out.nr_extents << @out.extent_order
+     *  3. @in.extent_start and @out.extent_start lists must not overlap
+     *  4. @out.extent_start lists GPFN bases to be populated
+     *  5. @out.extent_start is overwritten with allocated GMFN bases
+     */
+    struct xen_memory_reservation out;
+
+    /*
+     * [OUT] Number of input extents that were successfully exchanged:
+     *  1. The first @nr_exchanged input extents were successfully
+     *     deallocated.
+     *  2. The corresponding first entries in the output extent list correctly
+     *     indicate the GMFNs that were successfully exchanged.
+     *  3. All other input and output extents are untouched.
+     *  4. If not all input exents are exchanged then the return code of this
+     *     command will be non-zero.
+     *  5. THIS FIELD MUST BE INITIALISED TO ZERO BY THE CALLER!
+     */
+    xen_ulong_t nr_exchanged;
+};
+typedef struct xen_get_dma_buf xen_get_dma_buf_t;
+DEFINE_XEN_GUEST_HANDLE(xen_get_dma_buf_t);
+
+#define XENMEM_put_dma_buf             27
+/*
+ * XENMEM_put_dma_buf unpins a set of pages, previously pinned by
+ * XENMEM_get_dma_buf. After this call the p2m mapping of the pages can
+ * be transparently changed by the hypervisor, as usual. The pages are
+ * still accessible from the guest.
+ */
+struct xen_put_dma_buf {
+    /*
+     * [IN] Details of memory extents to be exchanged (GMFN bases).
+     * Note that @in.address_bits is ignored and unused.
+     */
+    struct xen_memory_reservation in;
+};
+typedef struct xen_put_dma_buf xen_put_dma_buf_t;
+DEFINE_XEN_GUEST_HANDLE(xen_put_dma_buf_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
-- 
1.7.2.5

Ian Campbell

2013-Aug-08 09:55 UTC

head link

Re: [PATCH v3 1/3] xen/arm: implement steal_page

On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini wrote:

Isn''t this identical to the x86 version? Can we make that common
instead?
> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> ---
>  xen/arch/arm/mm.c |   43 +++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 43 insertions(+), 0 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index f301e65..ea64c03 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -751,6 +751,49 @@ int donate_page(struct domain *d, struct page_info
*page, unsigned int memflags)
>  int steal_page(
>      struct domain *d, struct page_info *page, unsigned int memflags)
>  {
> +    unsigned long x, y;
> +    bool_t drop_dom_ref = 0;
> +
> +    spin_lock(&d->page_alloc_lock);
> +
> +    if ( is_xen_heap_page(page) || (page_get_owner(page) != d) )
> +        goto fail;
> +
> +    /*
> +     * We require there is just one reference (PGC_allocated). We
temporarily
> +     * drop this reference now so that we can safely swizzle the owner.
> +     */
> +    y = page->count_info;
> +    do {
> +        x = y;
> +        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 |
PGC_allocated) )
> +            goto fail;
> +        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
> +    } while ( y != x );
> +
> +    /* Swizzle the owner then reinstate the PGC_allocated reference. */
> +    page_set_owner(page, NULL);
> +    y = page->count_info;
> +    do {
> +        x = y;
> +        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
> +    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
> +
> +    /* Unlink from original owner. */
> +    if ( !(memflags & MEMF_no_refcount) &&
!domain_adjust_tot_pages(d, -1) )
> +        drop_dom_ref = 1;
> +    page_list_del(page, &d->page_list);
> +
> +    spin_unlock(&d->page_alloc_lock);
> +    if ( unlikely(drop_dom_ref) )
> +        put_domain(d);
> +    return 0;
> +
> + fail:
> +    spin_unlock(&d->page_alloc_lock);
> +    printk("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx,
taf=%lx\n",
> +            (void *)page_to_mfn(page), d, d->domain_id,
> +            page_get_owner(page), page->count_info,
page->u.inuse.type_info);
>      return -1;
>  }
>

Stefano Stabellini

2013-Aug-08 10:37 UTC

head link

Re: [PATCH v3 1/3] xen/arm: implement steal_page

On Thu, 8 Aug 2013, Ian Campbell wrote:> On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini wrote:
> 
> Isn''t this identical to the x86 version? Can we make that common
> instead?
Yes, it is identical. I can move it to common/memory.c.

> > Signed-off-by: Stefano Stabellini
<stefano.stabellini@eu.citrix.com>
> > ---
> >  xen/arch/arm/mm.c |   43 +++++++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 43 insertions(+), 0 deletions(-)
> > 
> > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> > index f301e65..ea64c03 100644
> > --- a/xen/arch/arm/mm.c
> > +++ b/xen/arch/arm/mm.c
> > @@ -751,6 +751,49 @@ int donate_page(struct domain *d, struct
page_info *page, unsigned int memflags)
> >  int steal_page(
> >      struct domain *d, struct page_info *page, unsigned int memflags)
> >  {
> > +    unsigned long x, y;
> > +    bool_t drop_dom_ref = 0;
> > +
> > +    spin_lock(&d->page_alloc_lock);
> > +
> > +    if ( is_xen_heap_page(page) || (page_get_owner(page) != d) )
> > +        goto fail;
> > +
> > +    /*
> > +     * We require there is just one reference (PGC_allocated). We
temporarily
> > +     * drop this reference now so that we can safely swizzle the
owner.
> > +     */
> > +    y = page->count_info;
> > +    do {
> > +        x = y;
> > +        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 |
PGC_allocated) )
> > +            goto fail;
> > +        y = cmpxchg(&page->count_info, x, x &
~PGC_count_mask);
> > +    } while ( y != x );
> > +
> > +    /* Swizzle the owner then reinstate the PGC_allocated reference.
*/
> > +    page_set_owner(page, NULL);
> > +    y = page->count_info;
> > +    do {
> > +        x = y;
> > +        BUG_ON((x & (PGC_count_mask|PGC_allocated)) !=
PGC_allocated);
> > +    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x
);
> > +
> > +    /* Unlink from original owner. */
> > +    if ( !(memflags & MEMF_no_refcount) &&
!domain_adjust_tot_pages(d, -1) )
> > +        drop_dom_ref = 1;
> > +    page_list_del(page, &d->page_list);
> > +
> > +    spin_unlock(&d->page_alloc_lock);
> > +    if ( unlikely(drop_dom_ref) )
> > +        put_domain(d);
> > +    return 0;
> > +
> > + fail:
> > +    spin_unlock(&d->page_alloc_lock);
> > +    printk("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx,
taf=%lx\n",
> > +            (void *)page_to_mfn(page), d, d->domain_id,
> > +            page_get_owner(page), page->count_info,
page->u.inuse.type_info);
> >      return -1;
> >  }
> >  
> 
>

Ian Campbell

2013-Aug-08 10:40 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini
wrote:> Introduce two new hypercalls XENMEM_get_dma_buf and XENMEM_put_dma_buf.
> 
> XENMEM_get_dma_buf exchanges a set of pages for a new set, contiguous and
> under 4G if so requested. The new pages are going to be "pinned"
so that
> their p2m mapping won''t be changed, until XENMEM_put_dma_buf is
called.
> XENMEM_get_dma_buf returns the MFNs of the new pages to the caller.
> 
> The only effect of XENMEM_put_dma_buf is to "unpin" the
previously
> pinned pages. Afterwards the p2m mappings can be transparently changed by
> the hypervisor as normal. The memory remains accessible from the guest.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
This is a common code (and ABI) change so you should have CCd Keir.
I''ve
added him.
> ---
>  xen/common/memory.c         |   73
+++++++++++++++++++++++++++++++++++++++++--
>  xen/include/public/memory.h |   64 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 134 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 50b740f..3e1db2a 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -279,7 +279,7 @@ static void decrease_reservation(struct memop_args *a)
>      a->nr_done = i;
>  }
>  
> -static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t)
arg)
> +static long memory_exchange(int op,
XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>  {
>      struct xen_memory_exchange exch;
>      PAGE_LIST_HEAD(in_chunk_list);
> @@ -496,7 +496,7 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>              mfn = page_to_mfn(page);
>              guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order);
>  
> -            if ( !paging_mode_translate(d) )
> +            if ( op == XENMEM_get_dma_buf || !paging_mode_translate(d) )
>              {
>                  for ( k = 0; k < (1UL << exch.out.extent_order);
k++ )
>                      set_gpfn_from_mfn(mfn + k, gpfn + k);
> @@ -505,6 +505,18 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>                                              &mfn, 1) )
>                      rc = -EFAULT;
>              }
> +
> +            if ( op == XENMEM_get_dma_buf )
I think this should be before the copy back of the mfn.
> +            {
> +                static int warning;
> +                if ( guest_physmap_pin_range(d, gpfn,
exch.out.extent_order) )
> +                {
> +                    if (!warning) {
> +                        gdprintk(XENLOG_WARNING,
"guest_physmap_pin_range not implemented\n");
> +                        warning = 1;
> +                    }
Not all error codes are ENOSYS.

Please just propagate the error code and fail the hypercall. No need to
log IMHO.
> @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>          break;
>  
> +    case XENMEM_get_dma_buf:
> +        /* xen_get_dma_buf_t is identical to xen_memory_exchange_t, so
> +         * just cast it and reuse memory_exchange */
>      case XENMEM_exchange:
> -        rc = memory_exchange(guest_handle_cast(arg,
xen_memory_exchange_t));
> +        rc = memory_exchange(op, guest_handle_cast(arg,
xen_memory_exchange_t));
> +        break;
> +
> +    case XENMEM_put_dma_buf:
> +        rc = put_dma_buf(guest_handle_cast(arg, xen_put_dma_buf_t));
>          break;
>  
>      case XENMEM_maximum_ram_page:
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index 7a26dee..354b117 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -459,6 +459,70 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
>   * The zero value is appropiate.
>   */
>  
> +#define XENMEM_get_dma_buf             26
> +/*
> + * This hypercall is similar to XENMEM_exchange:
And it''s argument struct is identical. I think you should just reuse
struct xen_memory_exchange (and update those comments to handle both
cases). Then you get rid of the nasty cast which you had to add. (I''m
not sure without looking it up if casting between two structs which
happen to have identical declarations is valid C.)
>  it exchanges the pages
> + * passed in with a new set of pages, contiguous and under 4G if so
> + * requested.
I guess it avoids exchanging the pages if they already meet this
constraint?

The "if so requested" binds to "under 4G" and not
"contiguous"? The
result is always contiguous I think. Also 4G is just an example and the
caller can ask for any order it likes.

The contents of the pages will be lost I think. Perhaps in debug builds
we should consider actually nuking the page''s content in the case where
we don''t actually exchange?
>  The new pages are going to be "pinned": it''s guaranteed
> + * that their p2m mapping won''t be changed until explicitly
"unpinned".
We have a privilege level check here through the use of
multipage_allocation_permitted, which permits up to 2MB mappings. But
unlike the original exchange these can be cumulative. Do we need a per
domain limit on the number of pinned p2m pages? My gut feeling is yes,
otherwise a guest can pin 1 page in every 2MB range, which isn''t so bad
now but will cause fragementation issues when we switch to 2MB p2m
mappings.
> + * If return code is zero then @out.extent_list provides the MFNs of the
> + * newly-allocated memory.  Returns zero on complete success, otherwise
> + * a negative error code.
> + * On complete success then always @nr_exchanged == @in.nr_extents.  On
> + * partial success @nr_exchanged indicates how much work was done.
> + */
> +struct xen_get_dma_buf {
> +    /*
> +     * [IN] Details of memory extents to be exchanged (GMFN bases).
> +     * Note that @in.address_bits is ignored and unused.
> +     */
> +    struct xen_memory_reservation in;
> +
> +    /*
> +     * [IN/OUT] Details of new memory extents.
> +     * We require that:
> +     *  1. @in.domid == @out.domid
> +     *  2. @in.nr_extents  << @in.extent_order == 
> +     *     @out.nr_extents << @out.extent_order
> +     *  3. @in.extent_start and @out.extent_start lists must not overlap
> +     *  4. @out.extent_start lists GPFN bases to be populated
> +     *  5. @out.extent_start is overwritten with allocated GMFN bases
> +     */
> +    struct xen_memory_reservation out;
> +
> +    /*
> +     * [OUT] Number of input extents that were successfully exchanged:
> +     *  1. The first @nr_exchanged input extents were successfully
> +     *     deallocated.
> +     *  2. The corresponding first entries in the output extent list
correctly
> +     *     indicate the GMFNs that were successfully exchanged.
> +     *  3. All other input and output extents are untouched.
> +     *  4. If not all input exents are exchanged then the return code of
this
"extents"
> +     *     command will be non-zero.
> +     *  5. THIS FIELD MUST BE INITIALISED TO ZERO BY THE CALLER!
> +     */
> +    xen_ulong_t nr_exchanged;
> +};
> +typedef struct xen_get_dma_buf xen_get_dma_buf_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_get_dma_buf_t);
> +
> +#define XENMEM_put_dma_buf             27
> +/*
> + * XENMEM_put_dma_buf unpins a set of pages, previously pinned by
> + * XENMEM_get_dma_buf. After this call the p2m mapping of the pages can
> + * be transparently changed by the hypervisor, as usual. The pages are
> + * still accessible from the guest.
> + */
> +struct xen_put_dma_buf {
> +    /*
> +     * [IN] Details of memory extents to be exchanged (GMFN bases).
> +     * Note that @in.address_bits is ignored and unused.
> +     */
> +    struct xen_memory_reservation in;
> +};
> +typedef struct xen_put_dma_buf xen_put_dma_buf_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_put_dma_buf_t);
> +
>  #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
>  
>  #endif /* __XEN_PUBLIC_MEMORY_H__ */

Ian Campbell

2013-Aug-08 10:52 UTC

head link

Re: [PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range

On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini
wrote:> guest_physmap_pin_range pins a range of guest pages so that they p2m
s/they/their/
> mappings won''t be changed.
> guest_physmap_unpin_range unpins the previously pinned pages.
> 
> The pinning is done using one of the spare bits in the p2m ptes.
> 
> Provide empty stubs for x86.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> ---
>  xen/arch/arm/p2m.c         |  131
++++++++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/mm.h   |    4 +
>  xen/include/asm-arm/page.h |    6 ++-
>  xen/include/asm-x86/p2m.h  |   12 ++++
>  4 files changed, 152 insertions(+), 1 deletions(-)
> 
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 307c6d4..7543ca8 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -75,6 +75,129 @@ done:
>      return maddr;
>  }
>  
> +
> +int guest_physmap_pin_range(struct domain *d,
> +                            xen_pfn_t gpfn,
> +                            unsigned int order)
> +{
> +    lpae_t *first = NULL, *second = NULL, *third = NULL, pte;
> +    paddr_t addr = gpfn << PAGE_SHIFT;
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    int rc = -EFAULT, i;
> +    unsigned long cur_first_offset = ~0, cur_second_offset = ~0;
> +
> +    spin_lock(&p2m->lock);
> +
> +    first = __map_domain_page(p2m->first_level);
> +
> +    if ( !first ||
> +            !first[first_table_offset(addr)].p2m.valid ||
> +            !first[first_table_offset(addr)].p2m.table )
> +        goto err;
> +
> +    for ( i = 0; i < (1UL << order); i++ )
> +    {
> +        if ( cur_first_offset != first_table_offset(addr) )
> +        {
> +            if (second) unmap_domain_page(second);
> +            second =
map_domain_page(first[first_table_offset(addr)].p2m.base);
> +            cur_first_offset = first_table_offset(addr);
> +        }
> +        if ( !second ||
> +                !second[second_table_offset(addr)].p2m.valid ||
> +                !second[second_table_offset(addr)].p2m.table )
> +            goto err;
> +
> +        if ( cur_second_offset != second_table_offset(addr) )
> +        {
> +            if (third) unmap_domain_page(third);
> +            third =
map_domain_page(second[second_table_offset(addr)].p2m.base);
> +            cur_second_offset = second_table_offset(addr);
> +        }
> +        if ( !third ||
> +                !third[third_table_offset(addr)].p2m.valid ||
> +                third[third_table_offset(addr)].p2m.avail &
P2M_DMA_PIN )
At this point rc == -EFAULT, which doesn''t seem right when failing the
P2M_DMA_PIN check.
> +            goto err;
> +
> +        pte = third[third_table_offset(addr)];
> +        pte.p2m.avail |= P2M_DMA_PIN;
> +        write_pte(&third[third_table_offset(addr)], pte);
> +
> +        addr += PAGE_SIZE;
> +    }
> +
> +    rc = 0;
> +
> +err:
Leaks final mapping of third I think?

You add two p2m walkers in this patch, and we have at least one already.
Perhaps it is time for a generic function which calls a callback on the
leaf ptes?
> +    if ( second ) unmap_domain_page(second);
> +    if ( first ) unmap_domain_page(first);
> +
> +    spin_unlock(&p2m->lock);
> +
> +    return rc;
> +}
> +
> +int guest_physmap_unpin_range(struct domain *d,
> +                              xen_pfn_t gpfn,
> +                              unsigned int order)
> +{
> +    lpae_t *first = NULL, *second = NULL, *third = NULL, pte;
> +    paddr_t addr = gpfn << PAGE_SHIFT;
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    int rc = -EFAULT, i;
> +    unsigned long cur_first_offset = ~0, cur_second_offset = ~0;
> [...]
Same generic comments as the pin version.
> +    rc = 0;
> +
> +err:
> +    if ( second ) unmap_domain_page(second);
> +    if ( first ) unmap_domain_page(first);
> +
> +    spin_unlock(&p2m->lock);
> +
> +    return rc;
> +}
> +
>  int guest_physmap_mark_populate_on_demand(struct domain *d,
>                                            unsigned long gfn,
>                                            unsigned int order)
> @@ -184,6 +307,14 @@ static int create_p2m_entries(struct domain *d,
>              cur_second_offset = second_table_offset(addr);
>          }
>  
> +        if ( third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN )
> +        {
> +            rc = -EINVAL;
> +            printk("%s: cannot change p2m mapping for
paddr=%"PRIpaddr
> +                    " domid=%d, the page is pinned\n", __func__,
addr, d->domain_id);
Not sure if a guest can trigger this, if yes then gdprintk.

Do we always clear this bit when we clear the present bit? If not then
we could end up with entries which are pinned and not-present. We should
probably avoid that one way or another!
> +            goto out;
> +        }
> +
>          flush = third[third_table_offset(addr)].p2m.valid;
>  
>          /* Allocate a new RAM page and attach */
> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index 5e7c5a3..8db8816 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -319,6 +319,10 @@ void free_init_memory(void);
>  
>  int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long
gfn,
>                                            unsigned int order);
> +int guest_physmap_pin_range(struct domain *d, xen_pfn_t gpfn,
> +                            unsigned int order);
> +int guest_physmap_unpin_range(struct domain *d, xen_pfn_t gpfn,
> +                              unsigned int order);
>  
>  extern void put_page_type(struct page_info *page);
>  static inline void put_page_and_type(struct page_info *page)
> diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> index 41e9eff..12087d0 100644
> --- a/xen/include/asm-arm/page.h
> +++ b/xen/include/asm-arm/page.h
> @@ -153,11 +153,15 @@ typedef struct {
>      unsigned long hint:1;       /* In a block of 16 contiguous entries */
>      unsigned long sbz2:1;
>      unsigned long xn:1;         /* eXecute-Never */
> -    unsigned long avail:4;      /* Ignored by hardware */
> +    unsigned long avail:4;      /* Ignored by hardware, see below */
>  
>      unsigned long sbz1:5;
>  } __attribute__((__packed__)) lpae_p2m_t;
>  
> +/* Xen "avail" bits allocation in third level entries */
> +#define P2M_DMA_PIN     (1<<0)  /* The page has been
"pinned": the hypervisor
> +                                   promises not to change the p2m mapping.
*/
I guess we are going to want to keep type info here eventually, this
leaves 3 bits for that.

For reference x86 has the following p2m types :
    p2m_ram_rw = 0,             /* Normal read/write guest RAM */
    p2m_invalid = 1,            /* Nothing mapped here */
    p2m_ram_logdirty = 2,       /* Temporarily read-only for log-dirty */
    p2m_ram_ro = 3,             /* Read-only; writes are silently dropped */
    p2m_mmio_dm = 4,            /* Reads and write go to the device model */
    p2m_mmio_direct = 5,        /* Read/write mapping of genuine MMIO area */
    p2m_populate_on_demand = 6, /* Place-holder for empty memory */

    /* Although these are defined in all builds, they can only
     * be used in 64-bit builds */
    p2m_grant_map_rw = 7,         /* Read/write grant mapping */
    p2m_grant_map_ro = 8,         /* Read-only grant mapping */
    p2m_ram_paging_out = 9,       /* Memory that is being paged out */
    p2m_ram_paged = 10,           /* Memory that has been paged out */
    p2m_ram_paging_in = 11,       /* Memory that is being paged in */
    p2m_ram_shared = 12,          /* Shared or sharable memory */
    p2m_ram_broken = 13,          /* Broken page, access cause domain crash */

We can get away with not having some of these for now, but not
necessarily in the long term I think. 4 available bits is really too
few :-(

I''m unsure if pinning is orthogonal to type. I think it probably is? Or
can we say that only normal ram can be pinned? Actually that may make
sense since many of the other types are either implicitly pinned already
(grant map, mmio etc) or pinning makes no sense (broken) or pinning
would have to change the type anyway (pin shared -> unshare -> ram_rw
+pinned, likewise for the paging types).

So the pinable types are p2m_ram_rw, p2m_ram_ro, p2m_ram_logdirty? We
could perhaps declare that ro and logdirty are implicitly pinned too?
For log dirty it''s pretty obvious, not so sure about ro -- doing DMA
even just from r/o memory seems a bit dodgy to allow.

Ian.

Jan Beulich

2013-Aug-08 11:43 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

>>> On 05.08.13 at 18:40, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> @@ -496,7 +496,7 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>              mfn = page_to_mfn(page);
>              guest_physmap_add_page(d, gpfn, mfn, exch.out.extent_order);
>  
> -            if ( !paging_mode_translate(d) )
> +            if ( op == XENMEM_get_dma_buf || !paging_mode_translate(d) )
This is bogus: Whether the translation tables need updating
here shouldn''t depend on the operation.
> @@ -505,6 +505,18 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>                                              &mfn, 1) )
>                      rc = -EFAULT;
>              }
> +
> +            if ( op == XENMEM_get_dma_buf )
> +            {
> +                static int warning;
bool_t. Missing blank line. And this should go into the next scope,
or the two if-s should be combined.
> +                if ( guest_physmap_pin_range(d, gpfn,
exch.out.extent_order) )
> +                {
> +                    if (!warning) {
Coding style.
> +                        gdprintk(XENLOG_WARNING,
"guest_physmap_pin_range not implemented\n");
Is "not implemented" really the only failure mode?
> +static long put_dma_buf(XEN_GUEST_HANDLE_PARAM(xen_put_dma_buf_t) arg)
> +{
> +    int rc, i;
Please no signed variables for loops having an unsigned upper
bound. Even more, the bound is xen_ulong_t, i.e. the loop below
may end up running forever on 64-bit at least.
> @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>          break;
>  
> +    case XENMEM_get_dma_buf:
> +        /* xen_get_dma_buf_t is identical to xen_memory_exchange_t, so
> +         * just cast it and reuse memory_exchange */
Then why do you define a new type for it in the first place?

And anyway, the naming of the new sub-op isn''t really
representing what the operation is capable of.
XENMEM_exchange_and_pin would seem to come much closer.
>      case XENMEM_exchange:
> -        rc = memory_exchange(guest_handle_cast(arg,
xen_memory_exchange_t));
> +        rc = memory_exchange(op, guest_handle_cast(arg,
xen_memory_exchange_t));
> +        break;
> +
> +    case XENMEM_put_dma_buf:
> +        rc = put_dma_buf(guest_handle_cast(arg, xen_put_dma_buf_t));
And similarly this should be XENMEM_unpin or some such.
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -459,6 +459,70 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
>   * The zero value is appropiate.
>   */
>  
> +#define XENMEM_get_dma_buf             26
> +/*
> + * This hypercall is similar to XENMEM_exchange: it exchanges the pages
> + * passed in with a new set of pages, contiguous and under 4G if so
The mentioning of 4G here is pointless - the hypercall can do all
sorts of address range restriction, so why mention this special
case?

Jan

Keir Fraser

2013-Aug-08 14:12 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On 08/08/2013 11:40, "Ian Campbell" <Ian.Campbell@citrix.com>
wrote:
>>  it exchanges the pages
>> + * passed in with a new set of pages, contiguous and under 4G if so
>> + * requested.
> 
> I guess it avoids exchanging the pages if they already meet this
> constraint?
> 
> The "if so requested" binds to "under 4G" and not
"contiguous"? The
> result is always contiguous I think. Also 4G is just an example and the
> caller can ask for any order it likes.
> 
> The contents of the pages will be lost I think. Perhaps in debug builds
> we should consider actually nuking the page''s content in the case
where
> we don''t actually exchange?
This confused me too, this talk of contiguity and being below 4GB.
Doesn''t
this new hypercall behave *exactly* like XENMEM_exchange, except for the
additional pinning of the MFNs? If so, just say that in the patch header and
code comments. The fact *you* currently use it to get memory below 4GB is an
application detail.

I wonder whether XENMEM_exchange is much use at all in the presence of
paging, without the additional semantics of pinning? Perhaps XENMEM_exchange
could be suitably adjusted to have the pinning semantics in this case?

 -- Keir

Stefano Stabellini

2013-Aug-08 14:12 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On Thu, 8 Aug 2013, Jan Beulich wrote:> >>> On 05.08.13 at 18:40, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> > @@ -496,7 +496,7 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
> >              mfn = page_to_mfn(page);
> >              guest_physmap_add_page(d, gpfn, mfn,
exch.out.extent_order);
> >  
> > -            if ( !paging_mode_translate(d) )
> > +            if ( op == XENMEM_get_dma_buf ||
!paging_mode_translate(d) )
> 
> This is bogus: Whether the translation tables need updating
> here shouldn''t depend on the operation.
This is key: I need an hypercall that writes back the mfn of the new
buffer. The old hypercall doesn''t do it. This time I specifically wrote
in a comment:

* If return code is zero then @out.extent_list provides the MFNs of the
* newly-allocated memory.

Also it makes sense for the old hypercall not to return the mfn to
autotranslate guests because the p2m mappings are not pinned, so it
can''t guarantee that they won''t change.

> > @@ -505,6 +505,18 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
> >                                              &mfn, 1) )
> >                      rc = -EFAULT;
> >              }
> > +
> > +            if ( op == XENMEM_get_dma_buf )
> > +            {
> > +                static int warning;
> 
> bool_t. Missing blank line. And this should go into the next scope,
> or the two if-s should be combined.
> 
> > +                if ( guest_physmap_pin_range(d, gpfn,
exch.out.extent_order) )
> > +                {
> > +                    if (!warning) {
> 
> Coding style.
> 
> > +                        gdprintk(XENLOG_WARNING,
"guest_physmap_pin_range not implemented\n");
> 
> Is "not implemented" really the only failure mode?
My mistake. I''ll remove the warning and propagate the failure instead.

> > +static long put_dma_buf(XEN_GUEST_HANDLE_PARAM(xen_put_dma_buf_t)
arg)
> > +{
> > +    int rc, i;
> 
> Please no signed variables for loops having an unsigned upper
> bound. Even more, the bound is xen_ulong_t, i.e. the loop below
> may end up running forever on 64-bit at least.
Argh, thanks for catching this.

> > @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
> >  
> >          break;
> >  
> > +    case XENMEM_get_dma_buf:
> > +        /* xen_get_dma_buf_t is identical to xen_memory_exchange_t,
so
> > +         * just cast it and reuse memory_exchange */
> 
> Then why do you define a new type for it in the first place?
Because the original hypercall doesn''t copy back the mfn if the guest
it''s an autotranslate guest.
And because it doesn''t pin the p2m mappings.

> And anyway, the naming of the new sub-op isn''t really
> representing what the operation is capable of.
> XENMEM_exchange_and_pin would seem to come much closer.
I like this suggestion. I could also keep xen_memory_exchange as
argument for XENMEM_exchange_and_pin, as Ian suggested.

> >      case XENMEM_exchange:
> > -        rc = memory_exchange(guest_handle_cast(arg,
xen_memory_exchange_t));
> > +        rc = memory_exchange(op, guest_handle_cast(arg,
xen_memory_exchange_t));
> > +        break;
> > +
> > +    case XENMEM_put_dma_buf:
> > +        rc = put_dma_buf(guest_handle_cast(arg, xen_put_dma_buf_t));
> 
> And similarly this should be XENMEM_unpin or some such.
OK.

> > --- a/xen/include/public/memory.h
> > +++ b/xen/include/public/memory.h
> > @@ -459,6 +459,70 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
> >   * The zero value is appropiate.
> >   */
> >  
> > +#define XENMEM_get_dma_buf             26
> > +/*
> > + * This hypercall is similar to XENMEM_exchange: it exchanges the
pages
> > + * passed in with a new set of pages, contiguous and under 4G if so
> 
> The mentioning of 4G here is pointless - the hypercall can do all
> sorts of address range restriction, so why mention this special
> case?
Good point.

Stefano Stabellini

2013-Aug-08 14:16 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On Thu, 8 Aug 2013, Keir Fraser wrote:> On 08/08/2013 11:40, "Ian Campbell"
<Ian.Campbell@citrix.com> wrote:
> 
> >>  it exchanges the pages
> >> + * passed in with a new set of pages, contiguous and under 4G if
so
> >> + * requested.
> > 
> > I guess it avoids exchanging the pages if they already meet this
> > constraint?
> > 
> > The "if so requested" binds to "under 4G" and not
"contiguous"? The
> > result is always contiguous I think. Also 4G is just an example and
the
> > caller can ask for any order it likes.
> > 
> > The contents of the pages will be lost I think. Perhaps in debug
builds
> > we should consider actually nuking the page''s content in the
case where
> > we don''t actually exchange?
> 
> This confused me too, this talk of contiguity and being below 4GB.
Doesn''t
> this new hypercall behave *exactly* like XENMEM_exchange, except for the
> additional pinning of the MFNs? If so, just say that in the patch header
and
> code comments. The fact *you* currently use it to get memory below 4GB is
an
> application detail.
Yeah, you have a good point there. Trying to make the comment clearer I
actually confused it. I''ll fix it.

> I wonder whether XENMEM_exchange is much use at all in the presence of
> paging, without the additional semantics of pinning? Perhaps
XENMEM_exchange
> could be suitably adjusted to have the pinning semantics in this case?
It''s not just about pinning, we also need to get the mfn back even if
the guest is autotranslate, that XENMEM_exchange doesn''t do.

Tim Deegan

2013-Aug-08 14:26 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

At 15:16 +0100 on 08 Aug (1375974960), Stefano Stabellini
wrote:> On Thu, 8 Aug 2013, Keir Fraser wrote:
> > On 08/08/2013 11:40, "Ian Campbell"
<Ian.Campbell@citrix.com> wrote:
> > 
> > >>  it exchanges the pages
> > >> + * passed in with a new set of pages, contiguous and under
4G if so
> > >> + * requested.
> > > 
> > > I guess it avoids exchanging the pages if they already meet this
> > > constraint?
> > > 
> > > The "if so requested" binds to "under 4G" and
not "contiguous"? The
> > > result is always contiguous I think. Also 4G is just an example
and the
> > > caller can ask for any order it likes.
> > > 
> > > The contents of the pages will be lost I think. Perhaps in debug
builds
> > > we should consider actually nuking the page''s content in
the case where
> > > we don''t actually exchange?
> > 
> > This confused me too, this talk of contiguity and being below 4GB.
Doesn''t
> > this new hypercall behave *exactly* like XENMEM_exchange, except for
the
> > additional pinning of the MFNs? If so, just say that in the patch
header and
> > code comments. The fact *you* currently use it to get memory below 4GB
is an
> > application detail.
> 
> Yeah, you have a good point there. Trying to make the comment clearer I
> actually confused it. I''ll fix it.
> 
> 
> > I wonder whether XENMEM_exchange is much use at all in the presence of
> > paging, without the additional semantics of pinning? Perhaps
XENMEM_exchange
> > could be suitably adjusted to have the pinning semantics in this case?
> 
> It''s not just about pinning, we also need to get the mfn back even
if
> the guest is autotranslate, that XENMEM_exchange doesn''t do.
I''d prefer if this were phrased as a DMA address, or something like
that.  On systems with an IOMMU (e.g. x86) this address might not be an
MFN.

Tim.

Ian Campbell

2013-Aug-08 14:27 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On Thu, 2013-08-08 at 15:16 +0100, Stefano Stabellini
wrote:> On Thu, 8 Aug 2013, Keir Fraser wrote:
> > On 08/08/2013 11:40, "Ian Campbell"
<Ian.Campbell@citrix.com> wrote:
> > 
> > >>  it exchanges the pages
> > >> + * passed in with a new set of pages, contiguous and under
4G if so
> > >> + * requested.
> > > 
> > > I guess it avoids exchanging the pages if they already meet this
> > > constraint?
> > > 
> > > The "if so requested" binds to "under 4G" and
not "contiguous"? The
> > > result is always contiguous I think. Also 4G is just an example
and the
> > > caller can ask for any order it likes.
> > > 
> > > The contents of the pages will be lost I think. Perhaps in debug
builds
> > > we should consider actually nuking the page''s content in
the case where
> > > we don''t actually exchange?
> > 
> > This confused me too, this talk of contiguity and being below 4GB.
Doesn''t
> > this new hypercall behave *exactly* like XENMEM_exchange, except for
the
> > additional pinning of the MFNs? If so, just say that in the patch
header and
> > code comments. The fact *you* currently use it to get memory below 4GB
is an
> > application detail.
> 
> Yeah, you have a good point there. Trying to make the comment clearer I
> actually confused it. I''ll fix it.
> 
> 
> > I wonder whether XENMEM_exchange is much use at all in the presence of
> > paging, without the additional semantics of pinning? Perhaps
XENMEM_exchange
> > could be suitably adjusted to have the pinning semantics in this case?
> 
> It''s not just about pinning, we also need to get the mfn back even
if
> the guest is autotranslate, that XENMEM_exchange doesn''t do.
Plus current users won''t be expecting to have to unpin when they are
done, so just adding pinning to the existing function won''t work.

the struct doesn''t have a flags field so we can''t do it that
way either.

I suppose we could deprecate the existing exchange in favour of one with
a flags field, but that doesn''t seem any better than renaming this new
addition XENMEM_exchange_and_pin.

Ian.

Jan Beulich

2013-Aug-08 14:38 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

>>> On 08.08.13 at 16:12, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Thu, 8 Aug 2013, Jan Beulich wrote:
>> >>> On 05.08.13 at 18:40, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
>> > @@ -496,7 +496,7 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>> >              mfn = page_to_mfn(page);
>> >              guest_physmap_add_page(d, gpfn, mfn,
exch.out.extent_order);
>> >  
>> > -            if ( !paging_mode_translate(d) )
>> > +            if ( op == XENMEM_get_dma_buf ||
!paging_mode_translate(d) )
>> 
>> This is bogus: Whether the translation tables need updating
>> here shouldn''t depend on the operation.
> 
> This is key: I need an hypercall that writes back the mfn of the new
> buffer. The old hypercall doesn''t do it. This time I specifically
wrote
> in a comment:
> 
> * If return code is zero then @out.extent_list provides the MFNs of the
> * newly-allocated memory.
> 
> Also it makes sense for the old hypercall not to return the mfn to
> autotranslate guests because the p2m mappings are not pinned, so it
> can''t guarantee that they won''t change.
Oh, you''re talking about one half of the if() body, and I''m
talking
about the other (the part that was visible from the patch context):
I agree that copying back the MFNs is useful/necessary here. But
I don''t see why you''d want to call set_gpfn_from_mfn() - that
should have been taken care of already. So perhaps split the one if
into two?
>> > @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>> >  
>> >          break;
>> >  
>> > +    case XENMEM_get_dma_buf:
>> > +        /* xen_get_dma_buf_t is identical to
xen_memory_exchange_t, so
>> > +         * just cast it and reuse memory_exchange */
>> 
>> Then why do you define a new type for it in the first place?
> 
> Because the original hypercall doesn''t copy back the mfn if the
guest
> it''s an autotranslate guest.
> And because it doesn''t pin the p2m mappings.
Hmm, I''m confused: Here you reason why you want a new type
(note that I said "type", not "sub-hypercall"), ...
>> And anyway, the naming of the new sub-op isn''t really
>> representing what the operation is capable of.
>> XENMEM_exchange_and_pin would seem to come much closer.
> 
> I like this suggestion. I could also keep xen_memory_exchange as
> argument for XENMEM_exchange_and_pin, as Ian suggested.
... yet here you agree that xen_memory_exchange could be
re-used.

Jan

Stefano Stabellini

2013-Aug-08 14:44 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On Thu, 8 Aug 2013, Jan Beulich wrote:> >>> On 08.08.13 at 16:12, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> > On Thu, 8 Aug 2013, Jan Beulich wrote:
> >> >>> On 05.08.13 at 18:40, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> >> > @@ -496,7 +496,7 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
> >> >              mfn = page_to_mfn(page);
> >> >              guest_physmap_add_page(d, gpfn, mfn,
exch.out.extent_order);
> >> >  
> >> > -            if ( !paging_mode_translate(d) )
> >> > +            if ( op == XENMEM_get_dma_buf ||
!paging_mode_translate(d) )
> >> 
> >> This is bogus: Whether the translation tables need updating
> >> here shouldn''t depend on the operation.
> > 
> > This is key: I need an hypercall that writes back the mfn of the new
> > buffer. The old hypercall doesn''t do it. This time I
specifically wrote
> > in a comment:
> > 
> > * If return code is zero then @out.extent_list provides the MFNs of
the
> > * newly-allocated memory.
> > 
> > Also it makes sense for the old hypercall not to return the mfn to
> > autotranslate guests because the p2m mappings are not pinned, so it
> > can''t guarantee that they won''t change.
> 
> Oh, you''re talking about one half of the if() body, and
I''m talking
> about the other (the part that was visible from the patch context):
> I agree that copying back the MFNs is useful/necessary here. But
> I don''t see why you''d want to call set_gpfn_from_mfn() -
that
> should have been taken care of already. So perhaps split the one if
> into two?
Right, good suggestion.

> >> > @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd, 
> > XEN_GUEST_HANDLE_PARAM(void) arg)
> >> >  
> >> >          break;
> >> >  
> >> > +    case XENMEM_get_dma_buf:
> >> > +        /* xen_get_dma_buf_t is identical to
xen_memory_exchange_t, so
> >> > +         * just cast it and reuse memory_exchange */
> >> 
> >> Then why do you define a new type for it in the first place?
> > 
> > Because the original hypercall doesn''t copy back the mfn if
the guest
> > it''s an autotranslate guest.
> > And because it doesn''t pin the p2m mappings.
> 
> Hmm, I''m confused: Here you reason why you want a new type
> (note that I said "type", not "sub-hypercall"), ...
> 
> >> And anyway, the naming of the new sub-op isn''t really
> >> representing what the operation is capable of.
> >> XENMEM_exchange_and_pin would seem to come much closer.
> > 
> > I like this suggestion. I could also keep xen_memory_exchange as
> > argument for XENMEM_exchange_and_pin, as Ian suggested.
> 
> ... yet here you agree that xen_memory_exchange could be
> re-used. 
I meant only to keep the same parameter struct xen_memory_exchange.

Keir Fraser

2013-Aug-08 14:47 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On 08/08/2013 15:16, "Stefano Stabellini"
<stefano.stabellini@eu.citrix.com>
wrote:
>> This confused me too, this talk of contiguity and being below 4GB.
Doesn''t
>> this new hypercall behave *exactly* like XENMEM_exchange, except for
the
>> additional pinning of the MFNs? If so, just say that in the patch
header and
>> code comments. The fact *you* currently use it to get memory below 4GB
is an
>> application detail.
> 
> Yeah, you have a good point there. Trying to make the comment clearer I
> actually confused it. I''ll fix it.
> 
> 
>> I wonder whether XENMEM_exchange is much use at all in the presence of
>> paging, without the additional semantics of pinning? Perhaps
XENMEM_exchange
>> could be suitably adjusted to have the pinning semantics in this case?
> 
> It''s not just about pinning, we also need to get the mfn back even
if
> the guest is autotranslate, that XENMEM_exchange doesn''t do.
Ok, fine by me.

 -- Keir

Jan Beulich

2013-Aug-08 14:50 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

>>> On 08.08.13 at 16:44, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Thu, 8 Aug 2013, Jan Beulich wrote:
>> >>> On 08.08.13 at 16:12, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
>> > On Thu, 8 Aug 2013, Jan Beulich wrote:
>> >> >>> On 05.08.13 at 18:40, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
>> >> > @@ -630,8 +690,15 @@ long do_memory_op(unsigned long cmd,
>> > XEN_GUEST_HANDLE_PARAM(void) arg)
>> >> >  
>> >> >          break;
>> >> >  
>> >> > +    case XENMEM_get_dma_buf:
>> >> > +        /* xen_get_dma_buf_t is identical to
xen_memory_exchange_t, so
>> >> > +         * just cast it and reuse memory_exchange */
>> >> 
>> >> Then why do you define a new type for it in the first place?
>> > 
>> > Because the original hypercall doesn''t copy back the mfn
if the guest
>> > it''s an autotranslate guest.
>> > And because it doesn''t pin the p2m mappings.
>> 
>> Hmm, I''m confused: Here you reason why you want a new type
>> (note that I said "type", not "sub-hypercall"), ...
>> 
>> >> And anyway, the naming of the new sub-op isn''t really
>> >> representing what the operation is capable of.
>> >> XENMEM_exchange_and_pin would seem to come much closer.
>> > 
>> > I like this suggestion. I could also keep xen_memory_exchange as
>> > argument for XENMEM_exchange_and_pin, as Ian suggested.
>> 
>> ... yet here you agree that xen_memory_exchange could be
>> re-used.
>  
> I meant only to keep the same parameter struct xen_memory_exchange.
And that''s what I asked for.

Jan

Stefano Stabellini

2013-Aug-08 14:54 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On Thu, 8 Aug 2013, Jan Beulich wrote:> >>> On 08.08.13 at 16:44, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> > On Thu, 8 Aug 2013, Jan Beulich wrote:
> >> >>> On 08.08.13 at 16:12, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> >> > On Thu, 8 Aug 2013, Jan Beulich wrote:
> >> >> >>> On 05.08.13 at 18:40, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> >> >> > @@ -630,8 +690,15 @@ long do_memory_op(unsigned long
cmd,
> >> > XEN_GUEST_HANDLE_PARAM(void) arg)
> >> >> >  
> >> >> >          break;
> >> >> >  
> >> >> > +    case XENMEM_get_dma_buf:
> >> >> > +        /* xen_get_dma_buf_t is identical to
xen_memory_exchange_t, so
> >> >> > +         * just cast it and reuse memory_exchange
*/
> >> >> 
> >> >> Then why do you define a new type for it in the first
place?
> >> > 
> >> > Because the original hypercall doesn''t copy back the
mfn if the guest
> >> > it''s an autotranslate guest.
> >> > And because it doesn''t pin the p2m mappings.
> >> 
> >> Hmm, I''m confused: Here you reason why you want a new
type
> >> (note that I said "type", not
"sub-hypercall"), ...
> >> 
> >> >> And anyway, the naming of the new sub-op isn''t
really
> >> >> representing what the operation is capable of.
> >> >> XENMEM_exchange_and_pin would seem to come much closer.
> >> > 
> >> > I like this suggestion. I could also keep xen_memory_exchange
as
> >> > argument for XENMEM_exchange_and_pin, as Ian suggested.
> >> 
> >> ... yet here you agree that xen_memory_exchange could be
> >> re-used.
> >  
> > I meant only to keep the same parameter struct xen_memory_exchange.
> 
> And that''s what I asked for.
OK, I''ll try to be clearer: I think it''s reasonable to have a
new
hypercall, called XENMEM_exchange_and_pin, using the same struct
parameter of XENMEM_exchange, called struct xen_memory_exchange.
Actually Ian suggested this.

Do you agree?

If not, given that the paramters are actually the same, do you think I
should duplicate and rename the struct as I was doing in this patch?

Jan Beulich

2013-Aug-08 14:58 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

>>> On 08.08.13 at 16:54, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> OK, I''ll try to be clearer: I think it''s reasonable to
have a new
> hypercall, called XENMEM_exchange_and_pin, using the same struct
> parameter of XENMEM_exchange, called struct xen_memory_exchange.
> Actually Ian suggested this.
> 
> Do you agree?
Yes.
> If not, given that the paramters are actually the same, do you think I
> should duplicate and rename the struct as I was doing in this patch?
It was actually this duplication that I wanted to see removed.

Jan

Stefano Stabellini

2013-Aug-08 15:04 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On Thu, 8 Aug 2013, Tim Deegan wrote:> At 15:16 +0100 on 08 Aug (1375974960), Stefano Stabellini wrote:
> > On Thu, 8 Aug 2013, Keir Fraser wrote:
> > > On 08/08/2013 11:40, "Ian Campbell"
<Ian.Campbell@citrix.com> wrote:
> > > 
> > > >>  it exchanges the pages
> > > >> + * passed in with a new set of pages, contiguous and
under 4G if so
> > > >> + * requested.
> > > > 
> > > > I guess it avoids exchanging the pages if they already meet
this
> > > > constraint?
> > > > 
> > > > The "if so requested" binds to "under
4G" and not "contiguous"? The
> > > > result is always contiguous I think. Also 4G is just an
example and the
> > > > caller can ask for any order it likes.
> > > > 
> > > > The contents of the pages will be lost I think. Perhaps in
debug builds
> > > > we should consider actually nuking the page''s
content in the case where
> > > > we don''t actually exchange?
> > > 
> > > This confused me too, this talk of contiguity and being below
4GB. Doesn''t
> > > this new hypercall behave *exactly* like XENMEM_exchange, except
for the
> > > additional pinning of the MFNs? If so, just say that in the patch
header and
> > > code comments. The fact *you* currently use it to get memory
below 4GB is an
> > > application detail.
> > 
> > Yeah, you have a good point there. Trying to make the comment clearer
I
> > actually confused it. I''ll fix it.
> > 
> > 
> > > I wonder whether XENMEM_exchange is much use at all in the
presence of
> > > paging, without the additional semantics of pinning? Perhaps
XENMEM_exchange
> > > could be suitably adjusted to have the pinning semantics in this
case?
> > 
> > It''s not just about pinning, we also need to get the mfn back
even if
> > the guest is autotranslate, that XENMEM_exchange doesn''t do.
> 
> I''d prefer if this were phrased as a DMA address, or something
like
> that.  On systems with an IOMMU (e.g. x86) this address might not be an
> MFN.
I can see why we would want to make that clear. I''ll write DMA address
in the comment.

Ian Campbell

2013-Aug-08 15:25 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini
wrote:>  xen/include/public/memory.h |   64 +++++++++++++++++++++++++++++++++++++
Since I''ve just applied 73f18583dd82 "xen: arm: document which
hypercalls (and subops) are supported on ARM" please can you also add
this new call there.

Ian.

Konrad Rzeszutek Wilk

2013-Aug-09 15:46 UTC

head link

Re: [PATCH v3 1/3] xen/arm: implement steal_page

On Mon, Aug 05, 2013 at 05:40:54PM +0100, Stefano Stabellini
wrote:> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> ---
>  xen/arch/arm/mm.c |   43 +++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 43 insertions(+), 0 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index f301e65..ea64c03 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -751,6 +751,49 @@ int donate_page(struct domain *d, struct page_info
*page, unsigned int memflags)
>  int steal_page(
>      struct domain *d, struct page_info *page, unsigned int memflags)
>  {
> +    unsigned long x, y;
> +    bool_t drop_dom_ref = 0;
Shouldn''t this be true or false?
> +
> +    spin_lock(&d->page_alloc_lock);
> +
> +    if ( is_xen_heap_page(page) || (page_get_owner(page) != d) )
> +        goto fail;
> +
> +    /*
> +     * We require there is just one reference (PGC_allocated). We
temporarily
> +     * drop this reference now so that we can safely swizzle the owner.
> +     */
> +    y = page->count_info;
> +    do {
> +        x = y;
> +        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 |
PGC_allocated) )
> +            goto fail;
> +        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
> +    } while ( y != x );
> +
> +    /* Swizzle the owner then reinstate the PGC_allocated reference. */
> +    page_set_owner(page, NULL);
> +    y = page->count_info;
> +    do {
> +        x = y;
> +        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
> +    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
> +
> +    /* Unlink from original owner. */
> +    if ( !(memflags & MEMF_no_refcount) &&
!domain_adjust_tot_pages(d, -1) )
> +        drop_dom_ref = 1;
> +    page_list_del(page, &d->page_list);
> +
> +    spin_unlock(&d->page_alloc_lock);
> +    if ( unlikely(drop_dom_ref) )
> +        put_domain(d);
> +    return 0;
> +
> + fail:
> +    spin_unlock(&d->page_alloc_lock);
> +    printk("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx,
taf=%lx\n",
> +            (void *)page_to_mfn(page), d, d->domain_id,
> +            page_get_owner(page), page->count_info,
page->u.inuse.type_info);
>      return -1;
>  }
>  
> -- 
> 1.7.2.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Stefano Stabellini

2013-Aug-13 16:05 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On Thu, 8 Aug 2013, Ian Campbell wrote:> >  The new pages are going to be "pinned": it''s
guaranteed
> > + * that their p2m mapping won''t be changed until explicitly
"unpinned".
> 
> We have a privilege level check here through the use of
> multipage_allocation_permitted, which permits up to 2MB mappings. But
> unlike the original exchange these can be cumulative. Do we need a per
> domain limit on the number of pinned p2m pages? My gut feeling is yes,
> otherwise a guest can pin 1 page in every 2MB range, which isn''t
so bad
> now but will cause fragementation issues when we switch to 2MB p2m
> mappings.
Wouldn''t it cause fragementation issues only to itself?
As far as I can tell the only problem that pinned p2m pages can cause to
the hypervisor is preventing Xen from swapping or sharing domain pages.

Ian Campbell

2013-Aug-13 22:07 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On Tue, 2013-08-13 at 17:05 +0100, Stefano Stabellini
wrote:> On Thu, 8 Aug 2013, Ian Campbell wrote:
> > >  The new pages are going to be "pinned": it''s
guaranteed
> > > + * that their p2m mapping won''t be changed until
explicitly "unpinned".
> > 
> > We have a privilege level check here through the use of
> > multipage_allocation_permitted, which permits up to 2MB mappings. But
> > unlike the original exchange these can be cumulative. Do we need a per
> > domain limit on the number of pinned p2m pages? My gut feeling is yes,
> > otherwise a guest can pin 1 page in every 2MB range, which
isn''t so bad
> > now but will cause fragementation issues when we switch to 2MB p2m
> > mappings.
> 
> Wouldn''t it cause fragementation issues only to itself?
What I meant is that if a guest pins one page out of the 512 in a 2MB
mapping then the other 511 pages can''t easily/efficiently be used for
other guests even though they are "free".

I don''t know if it would be unreasonable to account the full 2MB range
to the guest until it has free''d it all, perhaps with some sort of
per-domain "free" list so that the guest can cause itself pain by
reusing them. Needs some thought...
> As far as I can tell the only problem that pinned p2m pages can cause to
> the hypervisor is preventing Xen from swapping or sharing domain pages.
You are right about swapping, but pinning would necessarily imply an
unshare I think.

Is there any point in a r/o pin? I suspect not.

Ian.

Stefano Stabellini

2013-Aug-14 13:40 UTC

head link

Re: [PATCH v3 1/3] xen/arm: implement steal_page

On Fri, 9 Aug 2013, Konrad Rzeszutek Wilk wrote:> On Mon, Aug 05, 2013 at 05:40:54PM +0100, Stefano Stabellini wrote:
> > Signed-off-by: Stefano Stabellini
<stefano.stabellini@eu.citrix.com>
> > ---
> >  xen/arch/arm/mm.c |   43 +++++++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 43 insertions(+), 0 deletions(-)
> > 
> > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> > index f301e65..ea64c03 100644
> > --- a/xen/arch/arm/mm.c
> > +++ b/xen/arch/arm/mm.c
> > @@ -751,6 +751,49 @@ int donate_page(struct domain *d, struct
page_info *page, unsigned int memflags)
> >  int steal_page(
> >      struct domain *d, struct page_info *page, unsigned int memflags)
> >  {
> > +    unsigned long x, y;
> > +    bool_t drop_dom_ref = 0;
> 
> Shouldn''t this be true or false?
actually in Xen we use 0 and 1 with bool_t

Stefano Stabellini

2013-Aug-14 16:08 UTC

head link

Re: [PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range

On Thu, 8 Aug 2013, Ian Campbell wrote:> > +int guest_physmap_pin_range(struct domain *d,
> > +                            xen_pfn_t gpfn,
> > +                            unsigned int order)
> > +{
> > +    lpae_t *first = NULL, *second = NULL, *third = NULL, pte;
> > +    paddr_t addr = gpfn << PAGE_SHIFT;
> > +    struct p2m_domain *p2m = &d->arch.p2m;
> > +    int rc = -EFAULT, i;
> > +    unsigned long cur_first_offset = ~0, cur_second_offset = ~0;
> > +
> > +    spin_lock(&p2m->lock);
> > +
> > +    first = __map_domain_page(p2m->first_level);
> > +
> > +    if ( !first ||
> > +            !first[first_table_offset(addr)].p2m.valid ||
> > +            !first[first_table_offset(addr)].p2m.table )
> > +        goto err;
> > +
> > +    for ( i = 0; i < (1UL << order); i++ )
> > +    {
> > +        if ( cur_first_offset != first_table_offset(addr) )
> > +        {
> > +            if (second) unmap_domain_page(second);
> > +            second =
map_domain_page(first[first_table_offset(addr)].p2m.base);
> > +            cur_first_offset = first_table_offset(addr);
> > +        }
> > +        if ( !second ||
> > +                !second[second_table_offset(addr)].p2m.valid ||
> > +                !second[second_table_offset(addr)].p2m.table )
> > +            goto err;
> > +
> > +        if ( cur_second_offset != second_table_offset(addr) )
> > +        {
> > +            if (third) unmap_domain_page(third);
> > +            third =
map_domain_page(second[second_table_offset(addr)].p2m.base);
> > +            cur_second_offset = second_table_offset(addr);
> > +        }
> > +        if ( !third ||
> > +                !third[third_table_offset(addr)].p2m.valid ||
> > +                third[third_table_offset(addr)].p2m.avail &
P2M_DMA_PIN )
> 
> At this point rc == -EFAULT, which doesn''t seem right when failing
the
> P2M_DMA_PIN check.
I''ll go with -EINVAL. Do you have any better suggestions otherwise?

> > +            goto err;
> > +
> > +        pte = third[third_table_offset(addr)];
> > +        pte.p2m.avail |= P2M_DMA_PIN;
> > +        write_pte(&third[third_table_offset(addr)], pte);
> > +
> > +        addr += PAGE_SIZE;
> > +    }
> > +
> > +    rc = 0;
> > +
> > +err:
> 
> Leaks final mapping of third I think?
> 
> You add two p2m walkers in this patch, and we have at least one already.
> Perhaps it is time for a generic function which calls a callback on the
> leaf ptes?
OK, good idea.

> > +    if ( second ) unmap_domain_page(second);
> > +    if ( first ) unmap_domain_page(first);
> > +
> > +    spin_unlock(&p2m->lock);
> > +
> > +    return rc;
> > +}
> > +
> > +int guest_physmap_unpin_range(struct domain *d,
> > +                              xen_pfn_t gpfn,
> > +                              unsigned int order)
> > +{
> > +    lpae_t *first = NULL, *second = NULL, *third = NULL, pte;
> > +    paddr_t addr = gpfn << PAGE_SHIFT;
> > +    struct p2m_domain *p2m = &d->arch.p2m;
> > +    int rc = -EFAULT, i;
> > +    unsigned long cur_first_offset = ~0, cur_second_offset = ~0;
> > [...]
> 
> Same generic comments as the pin version.
> 
> > +    rc = 0;
> > +
> > +err:
> > +    if ( second ) unmap_domain_page(second);
> > +    if ( first ) unmap_domain_page(first);
> > +
> > +    spin_unlock(&p2m->lock);
> > +
> > +    return rc;
> > +}
> > +
> >  int guest_physmap_mark_populate_on_demand(struct domain *d,
> >                                            unsigned long gfn,
> >                                            unsigned int order)
> > @@ -184,6 +307,14 @@ static int create_p2m_entries(struct domain *d,
> >              cur_second_offset = second_table_offset(addr);
> >          }
> >  
> > +        if ( third[third_table_offset(addr)].p2m.avail &
P2M_DMA_PIN )
> > +        {
> > +            rc = -EINVAL;
> > +            printk("%s: cannot change p2m mapping for
paddr=%"PRIpaddr
> > +                    " domid=%d, the page is pinned\n",
__func__, addr, d->domain_id);
> 
> Not sure if a guest can trigger this, if yes then gdprintk.
> 
> Do we always clear this bit when we clear the present bit? If not then
> we could end up with entries which are pinned and not-present. We should
> probably avoid that one way or another!
I went through the code: yes we do, because we always reset the entire
pte entry rather than only the valid bit.

> > +            goto out;
> > +        }
> > +
> >          flush = third[third_table_offset(addr)].p2m.valid;
> >  
> >          /* Allocate a new RAM page and attach */
> > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> > index 5e7c5a3..8db8816 100644
> > --- a/xen/include/asm-arm/mm.h
> > +++ b/xen/include/asm-arm/mm.h
> > @@ -319,6 +319,10 @@ void free_init_memory(void);
> >  
> >  int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned
long gfn,
> >                                            unsigned int order);
> > +int guest_physmap_pin_range(struct domain *d, xen_pfn_t gpfn,
> > +                            unsigned int order);
> > +int guest_physmap_unpin_range(struct domain *d, xen_pfn_t gpfn,
> > +                              unsigned int order);
> >  
> >  extern void put_page_type(struct page_info *page);
> >  static inline void put_page_and_type(struct page_info *page)
> > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> > index 41e9eff..12087d0 100644
> > --- a/xen/include/asm-arm/page.h
> > +++ b/xen/include/asm-arm/page.h
> > @@ -153,11 +153,15 @@ typedef struct {
> >      unsigned long hint:1;       /* In a block of 16 contiguous
entries */
> >      unsigned long sbz2:1;
> >      unsigned long xn:1;         /* eXecute-Never */
> > -    unsigned long avail:4;      /* Ignored by hardware */
> > +    unsigned long avail:4;      /* Ignored by hardware, see below */
> >  
> >      unsigned long sbz1:5;
> >  } __attribute__((__packed__)) lpae_p2m_t;
> >  
> > +/* Xen "avail" bits allocation in third level entries */
> > +#define P2M_DMA_PIN     (1<<0)  /* The page has been
"pinned": the hypervisor
> > +                                   promises not to change the p2m
mapping. */
> 
> I guess we are going to want to keep type info here eventually, this
> leaves 3 bits for that.
> 
> For reference x86 has the following p2m types :
>     p2m_ram_rw = 0,             /* Normal read/write guest RAM */
>     p2m_invalid = 1,            /* Nothing mapped here */
>     p2m_ram_logdirty = 2,       /* Temporarily read-only for log-dirty */
>     p2m_ram_ro = 3,             /* Read-only; writes are silently dropped
*/
>     p2m_mmio_dm = 4,            /* Reads and write go to the device model
*/
>     p2m_mmio_direct = 5,        /* Read/write mapping of genuine MMIO area
*/
>     p2m_populate_on_demand = 6, /* Place-holder for empty memory */
> 
>     /* Although these are defined in all builds, they can only
>      * be used in 64-bit builds */
>     p2m_grant_map_rw = 7,         /* Read/write grant mapping */
>     p2m_grant_map_ro = 8,         /* Read-only grant mapping */
>     p2m_ram_paging_out = 9,       /* Memory that is being paged out */
>     p2m_ram_paged = 10,           /* Memory that has been paged out */
>     p2m_ram_paging_in = 11,       /* Memory that is being paged in */
>     p2m_ram_shared = 12,          /* Shared or sharable memory */
>     p2m_ram_broken = 13,          /* Broken page, access cause domain crash
*/
> 
> We can get away with not having some of these for now, but not
> necessarily in the long term I think. 4 available bits is really too
> few :-(
> 
> I''m unsure if pinning is orthogonal to type. I think it probably
is? Or
> can we say that only normal ram can be pinned? Actually that may make
> sense since many of the other types are either implicitly pinned already
> (grant map, mmio etc) or pinning makes no sense (broken) or pinning
> would have to change the type anyway (pin shared -> unshare -> ram_rw
> +pinned, likewise for the paging types).
> 
> So the pinable types are p2m_ram_rw, p2m_ram_ro, p2m_ram_logdirty? We
> could perhaps declare that ro and logdirty are implicitly pinned too?
> For log dirty it''s pretty obvious, not so sure about ro -- doing
DMA
> even just from r/o memory seems a bit dodgy to allow.
I agree. I think that only p2m_ram_rw should be pin-able.
I can add that to the comment.

Stefano Stabellini

2013-Aug-14 16:17 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On Thu, 8 Aug 2013, Ian Campbell wrote:> On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini wrote:
> >  xen/include/public/memory.h |   64
+++++++++++++++++++++++++++++++++++++
> 
> Since I''ve just applied 73f18583dd82 "xen: arm: document
which
> hypercalls (and subops) are supported on ARM" please can you also add
> this new call there.
Actually you have already written:

 *  HYPERVISOR_memory_op
 *   All generic sub-operations.

XENMEM_exchange_and_pin and XENMEM_unpin are generic subops.

Ian Campbell

2013-Aug-15 14:32 UTC

head link

Re: [PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range

On Wed, 2013-08-14 at 17:08 +0100, Stefano Stabellini
wrote:> On Thu, 8 Aug 2013, Ian Campbell wrote:
> > > +        if ( !third ||
> > > +                !third[third_table_offset(addr)].p2m.valid ||
> > > +                third[third_table_offset(addr)].p2m.avail &
P2M_DMA_PIN )
> > 
> > At this point rc == -EFAULT, which doesn''t seem right when
failing the
> > P2M_DMA_PIN check.
> 
> I''ll go with -EINVAL. Do you have any better suggestions
otherwise?
-EBUSY sort of fits here I guess?
> > > +        if ( third[third_table_offset(addr)].p2m.avail &
P2M_DMA_PIN )
> > > +        {
> > > +            rc = -EINVAL;
> > > +            printk("%s: cannot change p2m mapping for
paddr=%"PRIpaddr
> > > +                    " domid=%d, the page is pinned\n",
__func__, addr, d->domain_id);
> > 
> > Not sure if a guest can trigger this, if yes then gdprintk.
> > 
> > Do we always clear this bit when we clear the present bit? If not then
> > we could end up with entries which are pinned and not-present. We
should
> > probably avoid that one way or another!
> 
> I went through the code: yes we do, because we always reset the entire
> pte entry rather than only the valid bit.
Good!

Ian.

Ian Campbell

2013-Aug-15 14:32 UTC

head link

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

On Wed, 2013-08-14 at 17:17 +0100, Stefano Stabellini
wrote:> On Thu, 8 Aug 2013, Ian Campbell wrote:
> > On Mon, 2013-08-05 at 17:40 +0100, Stefano Stabellini wrote:
> > >  xen/include/public/memory.h |   64
+++++++++++++++++++++++++++++++++++++
> > 
> > Since I''ve just applied 73f18583dd82 "xen: arm: document
which
> > hypercalls (and subops) are supported on ARM" please can you also
add
> > this new call there.
> 
> Actually you have already written:
> 
>  *  HYPERVISOR_memory_op
>  *   All generic sub-operations.
> 
> XENMEM_exchange_and_pin and XENMEM_unpin are generic subops.
Perfect ;-)

Xen devel - Aug 2013 - [PATCH v3 0/3] introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

[PATCH v3 0/3] introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

[PATCH v3 1/3] xen/arm: implement steal_page

[PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range

[PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 1/3] xen/arm: implement steal_page

Re: [PATCH v3 1/3] xen/arm: implement steal_page

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 1/3] xen/arm: implement steal_page

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 1/3] xen/arm: implement steal_page

Re: [PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf

Re: [PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range

Re: [PATCH v3 3/3] xen: introduce XENMEM_get_dma_buf and XENMEM_put_dma_buf