thr3ads.net - Xen devel - [PATCH v6 0/5] introduce XENMEM_exchange_and_pin, XENMEM_unpin and XENMEM

If this information is useful, please help other people find it:
Share via:

Stefano Stabellini

2013-Sep-27 16:15 UTC

[PATCH v6 0/5] introduce XENMEM_exchange_and_pin, XENMEM_unpin and XENMEM_pin

Hi all,
this patch series introduces three new hypercalls to allow autotranslate
guests to allocate a contiguous buffer in machine addresses.
The XENMEM_exchange_and_pin returns the mfns and makes sure to pin the
pages so that the hypervisor won''t change their p2m mappings while in
use.
XENMEM_pin allows a guest to pin one or more pages whose machine
addresses respect the number of bits addressable by the caller as
expressed by the address_bits field.
XENMEM_unpin simply unpins the pages.

Cheers,

Stefano


Changes in v6:
- replace gdprintk with dprintk(XENLOG_G_ in donate_page;
- replace printk with dprintk(XENLOG_G_ in steal_page;
- remove casting of page_to_mfn to void*, use %lx instead;
- add "steal" to the steal_page gdprintk;
- replace 1 with (-PGC_count_mask & PGC_count_mask) in donate_page;
- improve p2m_walker description;
- cast 1UL to (paddr_t);
- handle the first level within the loop;
- use literal 1, 2, 3 for the levels;
- add missing continue in case of superpages;
- handle third level leaves with !table as errors;
- ASSERT second and third are not NULL after being mapped;
- guest_physmap_pin_range: check that the pages are normal r/w ram
pages;
- guest_physmap_unpin_range: return -EINVAL instead of -EBUSY if a page
is not pinned;
- use a count_info flag rather than using spare bits in the ptes;
- do not change error paths in memory_exchange;
- crash the guest on pinning failure;
- use the right op with hypercall_create_continuation;
- comments in public/memory.h should precede the #define;
- make unpin return int;
- improve the description of the new hypercalls;
- make sure to copy back the right value of nr_unpinned even on early
error paths;
- add a XENMEM_pin hypercall.

Changes in v5:
- move donate_page to common code;
- update commit message;
- align tests;
- comment p2m_walker;
- fix return codes in p2m_walker;
- handle superpages in p2m_walker;
- rename _p2m_lookup to p2m_lookup_f;
- return -EBUSY when the P2M_DMA_PIN check fails;
- rename _guest_physmap_pin_range to pin_one_pte;
- rename _guest_physmap_unpin_range to unpin_one_pte;
- memory_exchange: handle guest_physmap_pin_range failures;
- make i an unsigned long in unpinned;
- add nr_unpinned to xen_unpin to report partial success.

Changes in v4:
- move steal_page to common code;
- introduce a generic p2m walker and use it in p2m_lookup,
  guest_physmap_pin_range and guest_physmap_unpin_range;
- return -EINVAL when the P2M_DMA_PIN check fails;
- change the printk into a gdprintk;
- add a comment on what type of page can be pinned;
- rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin;
- rename XENMEM_get_dma_buf to XENMEM_unpin;
- move the pinning before we copy back the mfn to the guest;
- propagate errors returned by guest_physmap_pin_range;
- use xen_memory_exchange_t as parameter for XENMEM_exchange_and_pin;
- use an unsigned iterator in unpin;
- improve the documentation of the new hypercalls;
- add a note about out.address_bits for XENMEM_exchange.

Changes in v3:
- implement guest_physmap_pin_range and guest_physmap_unpin_range on
  ARM;
- implement XENMEM_put_dma_buf.

Changes in v2:
- actually don''t print the warning more than once.



Stefano Stabellini (5):
      xen: move steal_page and donate_page to common code
      xen/arm: introduce a generic p2m walker and use it in p2m_lookup
      xen: implement guest_physmap_pin_range and guest_physmap_unpin_range
      xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
      xen: introduce XENMEM_pin

 xen/arch/arm/mm.c           |   12 --
 xen/arch/arm/p2m.c          |  189 +++++++++++++++++++++++++----
 xen/arch/x86/mm.c           |   85 -------------
 xen/common/memory.c         |  278 ++++++++++++++++++++++++++++++++++++++++++-
 xen/include/asm-arm/mm.h    |   12 +-
 xen/include/asm-x86/mm.h    |    5 -
 xen/include/asm-x86/p2m.h   |   12 ++
 xen/include/public/memory.h |   85 +++++++++++++
 xen/include/xen/mm.h        |    3 +
 9 files changed, 545 insertions(+), 136 deletions(-)

Stefano Stabellini

2013-Sep-27 16:15 UTC

head link

[PATCH v6 1/5] xen: move steal_page and donate_page to common code

Only code movement, except for a small change to the warning printed in
case of an error in donate_page.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: keir@xen.org
CC: JBeulich@suse.com

Changes in v6:
- insert a missing black;
- replace gdprintk with dprintk(XENLOG_G_ in donate_page;
- replace printk with dprintk(XENLOG_G_ in steal_page;
- remove casting of page_to_mfn to void*, use %lx instead;
- add "steal" to the steal_page gdprintk;
- replace 1 with (-PGC_count_mask & PGC_count_mask) in donate_page.

Changes in v5:
- move donate_page to common code;
- update commit message.

Changes in v4:
- move steal_page to common code.
---
 xen/arch/arm/mm.c        |   12 ------
 xen/arch/x86/mm.c        |   85 --------------------------------------------
 xen/common/memory.c      |   88 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/mm.h |    5 ---
 xen/include/asm-x86/mm.h |    5 ---
 xen/include/xen/mm.h     |    3 ++
 6 files changed, 91 insertions(+), 107 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index cc084ec..44ec0e3 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -860,18 +860,6 @@ void arch_dump_shared_mem_info(void)
 {
 }
 
-int donate_page(struct domain *d, struct page_info *page, unsigned int
memflags)
-{
-    ASSERT(0);
-    return -ENOSYS;
-}
-
-int steal_page(
-    struct domain *d, struct page_info *page, unsigned int memflags)
-{
-    return -1;
-}
-
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
 {
     ASSERT(0);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e7f0e13..a512046 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4124,91 +4124,6 @@ int replace_grant_host_mapping(
     return rc;
 }
 
-int donate_page(
-    struct domain *d, struct page_info *page, unsigned int memflags)
-{
-    spin_lock(&d->page_alloc_lock);
-
-    if ( is_xen_heap_page(page) || (page_get_owner(page) != NULL) )
-        goto fail;
-
-    if ( d->is_dying )
-        goto fail;
-
-    if ( page->count_info & ~(PGC_allocated | 1) )
-        goto fail;
-
-    if ( !(memflags & MEMF_no_refcount) )
-    {
-        if ( d->tot_pages >= d->max_pages )
-            goto fail;
-        domain_adjust_tot_pages(d, 1);
-    }
-
-    page->count_info = PGC_allocated | 1;
-    page_set_owner(page, d);
-    page_list_add_tail(page,&d->page_list);
-
-    spin_unlock(&d->page_alloc_lock);
-    return 0;
-
- fail:
-    spin_unlock(&d->page_alloc_lock);
-    MEM_LOG("Bad donate %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%"
PRtype_info,
-            (void *)page_to_mfn(page), d, d->domain_id,
-            page_get_owner(page), page->count_info,
page->u.inuse.type_info);
-    return -1;
-}
-
-int steal_page(
-    struct domain *d, struct page_info *page, unsigned int memflags)
-{
-    unsigned long x, y;
-    bool_t drop_dom_ref = 0;
-
-    spin_lock(&d->page_alloc_lock);
-
-    if ( is_xen_heap_page(page) || (page_get_owner(page) != d) )
-        goto fail;
-
-    /*
-     * We require there is just one reference (PGC_allocated). We temporarily
-     * drop this reference now so that we can safely swizzle the owner.
-     */
-    y = page->count_info;
-    do {
-        x = y;
-        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
-            goto fail;
-        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
-    } while ( y != x );
-
-    /* Swizzle the owner then reinstate the PGC_allocated reference. */
-    page_set_owner(page, NULL);
-    y = page->count_info;
-    do {
-        x = y;
-        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
-    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
-
-    /* Unlink from original owner. */
-    if ( !(memflags & MEMF_no_refcount) &&
!domain_adjust_tot_pages(d, -1) )
-        drop_dom_ref = 1;
-    page_list_del(page, &d->page_list);
-
-    spin_unlock(&d->page_alloc_lock);
-    if ( unlikely(drop_dom_ref) )
-        put_domain(d);
-    return 0;
-
- fail:
-    spin_unlock(&d->page_alloc_lock);
-    MEM_LOG("Bad page %p: ed=%p(%u), sd=%p, caf=%08lx, taf=%"
PRtype_info,
-            (void *)page_to_mfn(page), d, d->domain_id,
-            page_get_owner(page), page->count_info,
page->u.inuse.type_info);
-    return -1;
-}
-
 static int __do_update_va_mapping(
     unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner)
 {
diff --git a/xen/common/memory.c b/xen/common/memory.c
index be35c00..c216004 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -257,6 +257,94 @@ end_remove:
     return 1;
 }
 
+int donate_page(
+    struct domain *d, struct page_info *page, unsigned int memflags)
+{
+    spin_lock(&d->page_alloc_lock);
+
+    if ( is_xen_heap_page(page) || (page_get_owner(page) != NULL) )
+        goto fail;
+
+    if ( d->is_dying )
+        goto fail;
+
+    if ( page->count_info & ~(PGC_allocated |
+                (-PGC_count_mask & PGC_count_mask)) )
+        goto fail;
+
+    if ( !(memflags & MEMF_no_refcount) )
+    {
+        if ( d->tot_pages >= d->max_pages )
+            goto fail;
+        domain_adjust_tot_pages(d, 1);
+    }
+
+    page->count_info = PGC_allocated | (-PGC_count_mask &
PGC_count_mask);
+    page_set_owner(page, d);
+    page_list_add_tail(page, &d->page_list);
+
+    spin_unlock(&d->page_alloc_lock);
+    return 0;
+
+ fail:
+    spin_unlock(&d->page_alloc_lock);
+    dprintk(XENLOG_G_WARNING,
+            "Bad donate %lx: ed=%p(%u), sd=%p, caf=%08lx,
taf=%016lx",
+            page_to_mfn(page), d, d->domain_id,
+            page_get_owner(page), page->count_info,
page->u.inuse.type_info);
+    return -1;
+}
+
+int steal_page(
+    struct domain *d, struct page_info *page, unsigned int memflags)
+{
+    unsigned long x, y;
+    bool_t drop_dom_ref = 0;
+
+    spin_lock(&d->page_alloc_lock);
+
+    if ( is_xen_heap_page(page) || (page_get_owner(page) != d) )
+        goto fail;
+
+    /*
+     * We require there is just one reference (PGC_allocated). We temporarily
+     * drop this reference now so that we can safely swizzle the owner.
+     */
+    y = page->count_info;
+    do {
+        x = y;
+        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
+            goto fail;
+        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
+    } while ( y != x );
+
+    /* Swizzle the owner then reinstate the PGC_allocated reference. */
+    page_set_owner(page, NULL);
+    y = page->count_info;
+    do {
+        x = y;
+        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
+    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
+
+    /* Unlink from original owner. */
+    if ( !(memflags & MEMF_no_refcount) &&
!domain_adjust_tot_pages(d, -1) )
+        drop_dom_ref = 1;
+    page_list_del(page, &d->page_list);
+
+    spin_unlock(&d->page_alloc_lock);
+    if ( unlikely(drop_dom_ref) )
+        put_domain(d);
+    return 0;
+
+ fail:
+    spin_unlock(&d->page_alloc_lock);
+    dprintk(XENLOG_G_WARNING,
+            "Bad steal page %lx: ed=%p(%u), sd=%p, caf=%08lx,
taf=%lx\n",
+            page_to_mfn(page), d, d->domain_id,
+            page_get_owner(page), page->count_info,
page->u.inuse.type_info);
+    return -1;
+}
+
 static void decrease_reservation(struct memop_args *a)
 {
     unsigned long i, j;
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 467687a..b7a9871 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -318,11 +318,6 @@ static inline int relinquish_shared_pages(struct domain *d)
 /* Arch-specific portion of memory_op hypercall. */
 long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 
-int steal_page(
-    struct domain *d, struct page_info *page, unsigned int memflags);
-int donate_page(
-    struct domain *d, struct page_info *page, unsigned int memflags);
-
 #define domain_set_alloc_bitsize(d) ((void)0)
 #define domain_clamp_alloc_bitsize(d, b) (b)
 
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 213fc9c..e5254ee 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -566,11 +566,6 @@ long subarch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void)
arg);
 int compat_arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void));
 int compat_subarch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void));
 
-int steal_page(
-    struct domain *d, struct page_info *page, unsigned int memflags);
-int donate_page(
-    struct domain *d, struct page_info *page, unsigned int memflags);
-
 int map_ldt_shadow_page(unsigned int);
 
 #define NIL(type) ((type *)-sizeof(type))
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 4f5795c..1dc859b 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -60,6 +60,9 @@ void destroy_xen_mappings(unsigned long v, unsigned long e);
 unsigned long domain_adjust_tot_pages(struct domain *d, long pages);
 int domain_set_outstanding_pages(struct domain *d, unsigned long pages);
 void get_outstanding_claims(uint64_t *free_pages, uint64_t *outstanding_pages);
+int steal_page(struct domain *d, struct page_info *page, unsigned int
memflags);
+int donate_page(struct domain *d, struct page_info *page, unsigned int
memflags);
+
 
 /* Domain suballocator. These functions are *not* interrupt-safe.*/
 void init_domheap_pages(paddr_t ps, paddr_t pe);
-- 
1.7.2.5

Stefano Stabellini

2013-Sep-27 16:15 UTC

head link

[PATCH v6 2/5] xen/arm: introduce a generic p2m walker and use it in p2m_lookup

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


Changes in v6:
- improve p2m_walker description;
- cast 1UL to (paddr_t);
- handle the first level within the loop;
- use literal 1, 2, 3 for the levels;
- add missing continue in case of superpages;
- handle third level leaves with !table as errors;
- ASSERT second and third are not NULL after being mapped.

Changes in v5:
- align tests;
- comment p2m_walker;
- fix return codes in p2m_walker;
- handle superpages in p2m_walker;
- rename _p2m_lookup to p2m_lookup_f.
---
 xen/arch/arm/p2m.c |  134 ++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 108 insertions(+), 26 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 307c6d4..05048c2 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -31,48 +31,130 @@ void p2m_load_VTTBR(struct domain *d)
 }
 
 /*
- * Lookup the MFN corresponding to a domain''s PFN.
+ * d: domain p2m to walk
+ * paddr: the guest start physical address
+ * order: page order
+ * func: function to call for each stage-2 lpae_t leaf entry found
+ * arg: opaque pointer to pass to func
  *
- * There are no processor functions to do a stage 2 only lookup therefore we
- * do a a software walk.
  */
-paddr_t p2m_lookup(struct domain *d, paddr_t paddr)
+static int p2m_walker(struct domain *d, paddr_t paddr, unsigned int order,
+                      int (*func)(lpae_t *pte, void *arg, int level), void*
arg)
 {
+    lpae_t *first = NULL, *second = NULL, *third = NULL;
     struct p2m_domain *p2m = &d->arch.p2m;
-    lpae_t pte, *first = NULL, *second = NULL, *third = NULL;
-    paddr_t maddr = INVALID_PADDR;
+    int rc = -EFAULT, level = 1;
+    unsigned long cur_first_offset = ~0, cur_second_offset = ~0;
+    paddr_t pend = paddr + ((paddr_t)1UL << order);
 
     spin_lock(&p2m->lock);
 
     first = __map_domain_page(p2m->first_level);
 
-    pte = first[first_table_offset(paddr)];
-    if ( !pte.p2m.valid || !pte.p2m.table )
-        goto done;
+    if ( !first )
+        goto err;
 
-    second = map_domain_page(pte.p2m.base);
-    pte = second[second_table_offset(paddr)];
-    if ( !pte.p2m.valid || !pte.p2m.table )
-        goto done;
+    while ( paddr < pend )
+    {
+        rc = -EFAULT;
+        level = 1;
 
-    third = map_domain_page(pte.p2m.base);
-    pte = third[third_table_offset(paddr)];
+        if ( !first[first_table_offset(paddr)].p2m.valid )
+            goto err;
 
-    /* This bit must be one in the level 3 entry */
-    if ( !pte.p2m.table )
-        pte.bits = 0;
+        if ( !first[first_table_offset(paddr)].p2m.table )
+        {
+            rc = func(&first[first_table_offset(paddr)], arg, level);
+            if ( rc != 0 )
+                goto err;
+            paddr += FIRST_SIZE;
+            continue;
+        }
 
-done:
-    if ( pte.p2m.valid )
-        maddr = (pte.bits & PADDR_MASK & PAGE_MASK) | (paddr &
~PAGE_MASK);
+        if ( cur_first_offset != first_table_offset(paddr) )
+        {
+            if (second) unmap_domain_page(second);
+            second =
map_domain_page(first[first_table_offset(paddr)].p2m.base);
+            cur_first_offset = first_table_offset(paddr);
+        }
+        level = 2;
+        ASSERT(second != NULL);
+        rc = -EFAULT;
+        if ( !second[second_table_offset(paddr)].p2m.valid )
+            goto err;
+        if ( !second[second_table_offset(paddr)].p2m.table )
+        {
+            rc = func(&first[first_table_offset(paddr)], arg, level);
+            if ( rc != 0 )
+                goto err;
+            paddr += SECOND_SIZE;
+            continue;
+        }
 
-    if (third) unmap_domain_page(third);
-    if (second) unmap_domain_page(second);
-    if (first) unmap_domain_page(first);
+        if ( cur_second_offset != second_table_offset(paddr) )
+        {
+            if (third) unmap_domain_page(third);
+            third =
map_domain_page(second[second_table_offset(paddr)].p2m.base);
+            cur_second_offset = second_table_offset(paddr);
+        }
+        level = 3;
+        ASSERT(third != NULL);
+        rc = -EFAULT;
+        if ( !third[third_table_offset(paddr)].p2m.table ||
+             !third[third_table_offset(paddr)].p2m.valid )
+            goto err;
+
+        rc = func(&third[third_table_offset(paddr)], arg, level);
+        if ( rc != 0 )
+            goto err;
+
+        paddr += PAGE_SIZE;
+    }
+
+    rc = 0;
+
+err:
+    if ( third ) unmap_domain_page(third);
+    if ( second ) unmap_domain_page(second);
+    if ( first ) unmap_domain_page(first);
 
     spin_unlock(&p2m->lock);
 
-    return maddr;
+    return rc;
+}
+
+struct p2m_lookup_t {
+    paddr_t paddr;
+    paddr_t maddr;
+};
+
+static int p2m_lookup_f(lpae_t *ptep, void *arg, int level)
+{
+    lpae_t pte;
+    struct p2m_lookup_t *p2m = (struct p2m_lookup_t *)arg;
+    ASSERT(level == 3);
+
+    pte = *ptep;
+
+    p2m->maddr = (pte.bits & PADDR_MASK & PAGE_MASK) |
+        (p2m->paddr & ~PAGE_MASK);
+    return 0;
+}
+/*
+ * Lookup the MFN corresponding to a domain''s PFN.
+ *
+ * There are no processor functions to do a stage 2 only lookup therefore we
+ * do a a software walk.
+ */
+paddr_t p2m_lookup(struct domain *d, paddr_t paddr)
+{
+    struct p2m_lookup_t p2m;
+    p2m.paddr = paddr;
+    p2m.maddr = INVALID_PADDR;
+
+    p2m_walker(d, paddr, 0, p2m_lookup_f, &p2m);
+
+    return p2m.maddr;
 }
 
 int guest_physmap_mark_populate_on_demand(struct domain *d,
-- 
1.7.2.5

Stefano Stabellini

2013-Sep-27 16:15 UTC

head link

[PATCH v6 3/5] xen: implement guest_physmap_pin_range and guest_physmap_unpin_range

guest_physmap_pin_range pins a range of guest pages so that their p2m
mappings won''t be changed.
guest_physmap_unpin_range unpins the previously pinned pages.
The pinning is done using a new count_info flag.

Provide empty stubs for x86.


Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Changes in v6:
- guest_physmap_pin_range: check that the pages are normal r/w ram pages;
- guest_physmap_unpin_range: return -EINVAL instead of -EBUSY if a page
is not pinned;
- use a count_info flag rather than using spare bits in the ptes.

Changes in v5:
- return -EBUSY when the P2M_DMA_PIN check fails;
- rename _guest_physmap_pin_range to pin_one_pte;
- rename _guest_physmap_unpin_range to unpin_one_pte.

Changes in v4:
- use p2m_walker to implement guest_physmap_pin_range and
guest_physmap_unpin_range;
- return -EINVAL when the P2M_DMA_PIN check fails;
- change the printk into a gdprintk;
- add a comment on what type of page can be pinned.
---
 xen/arch/arm/p2m.c        |   57 ++++++++++++++++++++++++++++++++++++++++++++-
 xen/include/asm-arm/mm.h  |    7 +++++
 xen/include/asm-x86/p2m.h |   12 +++++++++
 3 files changed, 75 insertions(+), 1 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 05048c2..ff32b5a 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -157,6 +157,49 @@ paddr_t p2m_lookup(struct domain *d, paddr_t paddr)
     return p2m.maddr;
 }
 
+int guest_physmap_pin_range(struct domain *d,
+                            xen_pfn_t gpfn,
+                            unsigned int order)
+{
+    int i;
+    struct page_info *page;
+    xen_pfn_t mfn;
+
+    for ( i = 0; i < (1UL << order); i++ )
+    {
+        mfn = gmfn_to_mfn(d, gpfn + i);
+        page = mfn_to_page(mfn);
+        if ( !page )
+            return -EINVAL;
+        if ( is_iomem_page(mfn) ||
+                !get_page_type(page, PGT_writable_page) )
+            return -EINVAL;
+        if ( test_and_set_bit(_PGC_p2m_pinned, &page->count_info) )
+            return -EBUSY;
+    }
+    return 0;
+}
+
+int guest_physmap_unpin_range(struct domain *d,
+                              xen_pfn_t gpfn,
+                              unsigned int order)
+{
+    int i;
+    struct page_info *page;
+    xen_pfn_t mfn;
+
+    for ( i = 0; i < (1UL << order); i++ )
+    {
+        mfn = gmfn_to_mfn(d, gpfn + i);
+        page = mfn_to_page(mfn);
+        if ( !page )
+            return -EINVAL;
+        if ( !test_and_clear_bit(_PGC_p2m_pinned, &page->count_info) )
+            return -EINVAL;
+    }
+    return 0;
+}
+
 int guest_physmap_mark_populate_on_demand(struct domain *d,
                                           unsigned long gfn,
                                           unsigned int order)
@@ -217,6 +260,7 @@ static int create_p2m_entries(struct domain *d,
     lpae_t *first = NULL, *second = NULL, *third = NULL;
     paddr_t addr;
     unsigned long cur_first_offset = ~0, cur_second_offset = ~0;
+    struct page_info *page;
 
     spin_lock(&p2m->lock);
 
@@ -228,6 +272,8 @@ static int create_p2m_entries(struct domain *d,
 
     for(addr = start_gpaddr; addr < end_gpaddr; addr += PAGE_SIZE)
     {
+        page = NULL;
+
         if ( !first[first_table_offset(addr)].p2m.valid )
         {
             rc = p2m_create_table(d, &first[first_table_offset(addr)]);
@@ -268,11 +314,20 @@ static int create_p2m_entries(struct domain *d,
 
         flush = third[third_table_offset(addr)].p2m.valid;
 
+        if ( flush )
+            page = mfn_to_page(third[third_table_offset(addr)].p2m.base);
+        if ( page && test_bit(_PGC_p2m_pinned,
&page->count_info) )
+        {
+            rc = -EINVAL;
+            gdprintk(XENLOG_WARNING, "cannot change p2m mapping for
paddr=%"PRIpaddr
+                    " domid=%d, the page is pinned\n", addr,
d->domain_id);
+            goto out;
+        }
+
         /* Allocate a new RAM page and attach */
         switch (op) {
             case ALLOCATE:
                 {
-                    struct page_info *page;
                     lpae_t pte;
 
                     rc = -ENOMEM;
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index b7a9871..06b5aad 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -99,6 +99,9 @@ struct page_info
   /* Page is Xen heap? */
 #define _PGC_xen_heap     PG_shift(2)
 #define PGC_xen_heap      PG_mask(1, 2)
+/* The page belong to a guest and it has been pinned. */
+#define _PGC_p2m_pinned   PG_shift(3)
+#define PGC_p2m_pinned    PG_mask(1, 3)
 /* ... */
 /* Page is broken? */
 #define _PGC_broken       PG_shift(7)
@@ -335,6 +338,10 @@ void free_init_memory(void);
 
 int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
                                           unsigned int order);
+int guest_physmap_pin_range(struct domain *d, xen_pfn_t gpfn,
+                            unsigned int order);
+int guest_physmap_unpin_range(struct domain *d, xen_pfn_t gpfn,
+                              unsigned int order);
 
 extern void put_page_type(struct page_info *page);
 static inline void put_page_and_type(struct page_info *page)
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 43583b2..b08a722 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -492,6 +492,18 @@ void guest_physmap_remove_page(struct domain *d,
 /* Set a p2m range as populate-on-demand */
 int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
                                           unsigned int order);
+static inline int guest_physmap_pin_range(struct domain *d,
+                                          xen_pfn_t gpfn,
+                                          unsigned int order)
+{
+    return -ENOSYS;
+}
+static inline int guest_physmap_unpin_range(struct domain *d,
+                              xen_pfn_t gpfn,
+                              unsigned int order)
+{
+    return -ENOSYS;
+}
 
 /* Change types across all p2m entries in a domain */
 void p2m_change_entry_type_global(struct domain *d, 
-- 
1.7.2.5

Stefano Stabellini

2013-Sep-27 16:15 UTC

head link

[PATCH v6 4/5] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin

Introduce two new hypercalls XENMEM_exchange_and_pin and XENMEM_unpin.

XENMEM_exchange_and_pin, it''s like XENMEM_exchange but also pins the
new
pages: their p2m mapping are guaranteed not to be changed, until
XENMEM_unpin is called.
XENMEM_exchange_and_pin returns the DMA frame numbers of the new pages
to the caller, even if it''s an autotranslate guest.

The only effect of XENMEM_unpin is to "unpin" the previously
pinned pages. Afterwards the p2m mappings can be transparently changed by
the hypervisor as normal. The memory remains accessible from the guest.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: tim@xen.org
CC: keir@xen.org
CC: JBeulich@suse.com


Changes in v6:
- do not change error paths;
- crash the guest on pinning failure;
- use the right op with hypercall_create_continuation;
- comments in public/memory.h should precede the #define;
- make unpin return int;
- improve the description of the new hypercalls;
- make sure to copy back the right value of nr_unpinned even on early
error paths.

Changes in v5:
- memory_exchange: handle guest_physmap_pin_range failures;
- make i an unsigned long in unpinned;
- add nr_unpinned to xen_unpin to report partial success.

Changes in v4:
- rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin;
- rename XENMEM_get_dma_buf to XENMEM_unpin;
- move the pinning before we copy back the mfn to the guest;
- propagate errors returned by guest_physmap_pin_range;
- use xen_memory_exchange_t as parameter for XENMEM_exchange_and_pin;
- use an unsigned iterator in unpin;
- improve the documentation of the new hypercalls;
- add a note about out.address_bits for XENMEM_exchange.
---
 xen/common/memory.c         |   97 +++++++++++++++++++++++++++++++++++++++++-
 xen/include/public/memory.h |   52 +++++++++++++++++++++++
 2 files changed, 146 insertions(+), 3 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index c216004..31a85be 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -394,7 +394,7 @@ static void decrease_reservation(struct memop_args *a)
     a->nr_done = i;
 }
 
-static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
+static long memory_exchange(int op,
XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
 {
     struct xen_memory_exchange exch;
     PAGE_LIST_HEAD(in_chunk_list);
@@ -470,6 +470,13 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
         goto fail_early;
     }
 
+    if ( op == XENMEM_exchange_and_pin && !is_hardware_domain(d) )
+    {
+        rc = -EPERM;
+        rcu_unlock_domain(d);
+        goto fail_early;
+    }
+
     memflags |= MEMF_bits(domain_clamp_alloc_bitsize(
         d,
         XENMEMF_get_address_bits(exch.out.mem_flags) ? :
@@ -487,7 +494,7 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
             if ( __copy_field_to_guest(arg, &exch, nr_exchanged) )
                 return -EFAULT;
             return hypercall_create_continuation(
-                __HYPERVISOR_memory_op, "lh", XENMEM_exchange, arg);
+                __HYPERVISOR_memory_op, "lh", op, arg);
         }
 
         /* Steal a chunk''s worth of input pages from the domain. */
@@ -615,6 +622,21 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
             {
                 for ( k = 0; k < (1UL << exch.out.extent_order); k++ )
                     set_gpfn_from_mfn(mfn + k, gpfn + k);
+            }
+            if ( op == XENMEM_exchange_and_pin )
+            {
+                rc = guest_physmap_pin_range(d, gpfn, exch.out.extent_order);
+                if ( rc )
+                {
+                    gdprintk(XENLOG_WARNING, "couldn''t pin
gpfn=%"PRIpaddr" order=%u\n",
+                            gpfn, exch.out.extent_order);
+                    rcu_unlock_domain(d);
+                    domain_crash(d);
+                    return -EFAULT;
+                }
+            }
+            if ( op == XENMEM_exchange_and_pin || !paging_mode_translate(d) )
+            {
                 if ( __copy_to_guest_offset(exch.out.extent_start,
                                             (i << out_chunk_order) + j,
                                             &mfn, 1) )
@@ -657,6 +679,70 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
     return rc;
 }
 
+static int unpin(XEN_GUEST_HANDLE_PARAM(xen_unpin_t) arg)
+{
+    int rc;
+    unsigned long i = 0;
+    struct xen_unpin unpin;
+    xen_pfn_t gpfn;
+    struct domain *d = NULL;
+
+    if ( copy_from_guest(&unpin, arg, 1) )
+        return -EFAULT;
+
+    /* Various sanity checks. */
+    if ( /* Extent orders are sensible? */
+         (unpin.in.extent_order > MAX_ORDER) ||
+         /* Sizes of input list do not overflow a long? */
+         ((~0UL >> unpin.in.extent_order) < unpin.in.nr_extents) )
+    {
+        rc = -EFAULT;
+        goto fail;
+    }
+
+    if ( !guest_handle_okay(unpin.in.extent_start, unpin.in.nr_extents) )
+    {
+        rc = -EFAULT;
+        goto fail;
+    }
+
+    d = rcu_lock_domain_by_any_id(unpin.in.domid);
+    if ( d == NULL )
+    {
+        rc = -ESRCH;
+        goto fail;
+    }
+
+    if ( !is_hardware_domain(d) )
+    {
+        rc = -EPERM;
+        goto fail;
+    }
+
+    for ( ; i < unpin.in.nr_extents; i++ )
+    {
+        if ( unlikely(__copy_from_guest_offset(
+                      &gpfn, unpin.in.extent_start, i, 1)) )
+        {
+            rc = -EFAULT;
+            goto fail;
+        }
+
+        rc = guest_physmap_unpin_range(d, gpfn, unpin.in.extent_order);
+        if ( rc )
+            goto fail;
+    }
+
+    rc = 0;
+
+ fail:
+    unpin.nr_unpinned = i;
+    if ( __copy_field_to_guest(arg, &unpin, nr_unpinned) )
+        rc = -EFAULT;
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d;
@@ -745,8 +831,13 @@ long do_memory_op(unsigned long cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
 
         break;
 
+    case XENMEM_exchange_and_pin:
     case XENMEM_exchange:
-        rc = memory_exchange(guest_handle_cast(arg, xen_memory_exchange_t));
+        rc = memory_exchange(op, guest_handle_cast(arg,
xen_memory_exchange_t));
+        break;
+
+    case XENMEM_unpin:
+        rc = unpin(guest_handle_cast(arg, xen_unpin_t));
         break;
 
     case XENMEM_maximum_ram_page:
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 7a26dee..4accbd4 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -105,6 +105,9 @@ struct xen_memory_exchange {
     /*
      * [IN] Details of memory extents to be exchanged (GMFN bases).
      * Note that @in.address_bits is ignored and unused.
+     * @out.address_bits contains the maximum number of bits addressable
+     * by the caller. The addresses of the newly allocated pages have to
+     * meet this restriction.
      */
     struct xen_memory_reservation in;
 
@@ -459,6 +462,55 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
  * The zero value is appropiate.
  */
 
+/*
+ * This hypercall is similar to XENMEM_exchange: it takes the same
+ * struct as an argument and it exchanges the pages passed in with a new
+ * set of pages. The new pages are going to be "pinned":
it''s guaranteed
+ * that their p2m mapping won''t be changed until explicitly
"unpinned".
+ * Only normal guest r/w memory can be pinned: no granted pages or
+ * ballooned pages.
+ * If return code is zero then @out.extent_list provides the frame
+ * numbers of the newly-allocated memory.
+ * On X86 the frame numbers are machine frame numbers (mfns).
+ * On ARMv7 and ARMv8 the frame numbers are machine frame numbers (mfns).
+ * Returns zero on complete success, otherwise a negative error code.
+ * The most common error codes are:
+ *   -ENOSYS if not implemented
+ *   -EPERM  if the domain is not privileged for this operation
+ *   -EBUSY  if the page is already pinned
+ *   -EFAULT if an internal error occurs
+ * On complete success then always @nr_exchanged == @in.nr_extents.  On
+ * partial success @nr_exchanged indicates how much work was done and a
+ * negative error code is returned.
+ */
+#define XENMEM_exchange_and_pin             26
+
+/*
+ * XENMEM_unpin unpins a set of pages, previously pinned by
+ * XENMEM_exchange_and_pin. After this call the p2m mapping of the pages can
+ * be transparently changed by the hypervisor, as usual. The pages are
+ * still accessible from the guest.
+ */
+#define XENMEM_unpin             27
+struct xen_unpin {
+    /*
+     * [IN] Details of memory extents to be unpinned (GMFN bases).
+     * Note that @in.address_bits is ignored and unused.
+     */
+    struct xen_memory_reservation in;
+    /*
+     * [OUT] Number of input extents that were successfully unpinned.
+     *  1. The first @nr_unpinned input extents were successfully
+     *     unpinned.
+     *  2. All other input extents are untouched.
+     *  3. If not all input extents are unpinned then the return code of this
+     *     command will be non-zero.
+     */
+    xen_ulong_t nr_unpinned;
+};
+typedef struct xen_unpin xen_unpin_t;
+DEFINE_XEN_GUEST_HANDLE(xen_unpin_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
-- 
1.7.2.5

Stefano Stabellini

2013-Sep-27 16:16 UTC

head link

[PATCH v6 5/5] xen: introduce XENMEM_pin

Introduce a new hypercall to allow a guest to pin one or more pages
whose machine addresses respect the number of bits addressable by the
caller as expressed by the address_bits field.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 xen/common/memory.c         |   93 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/memory.h |   33 +++++++++++++++
 2 files changed, 126 insertions(+), 0 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 31a85be..93171d0 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -743,6 +743,95 @@ static int unpin(XEN_GUEST_HANDLE_PARAM(xen_unpin_t) arg)
     return rc;
 }
 
+static long pin(XEN_GUEST_HANDLE_PARAM(xen_pin_t) arg)
+{
+    int rc;
+    unsigned long i = 0, j;
+    struct xen_pin pin;
+    xen_pfn_t gpfn, mfn;
+    struct domain *d = NULL;
+    unsigned int memflags = 0;
+
+    if ( copy_from_guest(&pin, arg, 1) )
+        return -EFAULT;
+
+    /* Various sanity checks. */
+    if ( /* Extent orders are sensible? */
+         (pin.in.extent_order > MAX_ORDER) ||
+         /* Sizes of input list do not overflow a long? */
+         ((~0UL >> pin.in.extent_order) < pin.in.nr_extents) )
+    {
+        rc = -EFAULT;
+        goto fail;
+    }
+
+    if ( !guest_handle_okay(pin.in.extent_start, pin.in.nr_extents) )
+    {
+        rc = -EFAULT;
+        goto fail;
+    }
+
+    d = rcu_lock_domain_by_any_id(pin.in.domid);
+    if ( d == NULL )
+    {
+        rc = -ESRCH;
+        goto fail;
+    }
+
+    if ( !is_hardware_domain(d) )
+    {
+        rc = -EPERM;
+        goto fail;
+    }
+
+    memflags |= MEMF_bits(domain_clamp_alloc_bitsize(
+        d,
+        XENMEMF_get_address_bits(pin.in.mem_flags) ? :
+        (BITS_PER_LONG+PAGE_SHIFT)));
+
+    for ( ; i < pin.in.nr_extents; i++ )
+    {
+        if ( unlikely(__copy_from_guest_offset(
+                      &gpfn, pin.in.extent_start, i, 1)) )
+        {
+            rc = -EFAULT;
+            goto fail;
+        }
+
+        if ( generic_fls64(gpfn << PAGE_SHIFT) > memflags )
+        {
+            rc = -EINVAL;
+            goto fail;
+        }
+
+        rc = guest_physmap_pin_range(d, gpfn, pin.in.extent_order);
+        if ( rc )
+            goto fail;
+
+        for ( j = 0; j < (1UL << pin.in.extent_order); j++ )
+        {
+            mfn = gmfn_to_mfn(d, gpfn);
+            if ( __copy_to_guest_offset(pin.in.extent_start,
+                        (i << pin.in.extent_order) + j,
+                        &mfn, 1) )
+            {
+                rc = -EFAULT;
+                goto fail;
+            }
+        }
+    }
+
+    rc = 0;
+
+ fail:
+    pin.nr_pinned = i;
+    if ( __copy_field_to_guest(arg, &pin, nr_pinned) )
+        rc = -EFAULT;
+
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d;
@@ -840,6 +929,10 @@ long do_memory_op(unsigned long cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
         rc = unpin(guest_handle_cast(arg, xen_unpin_t));
         break;
 
+    case XENMEM_pin:
+        rc = pin(guest_handle_cast(arg, xen_pin_t));
+        break;
+
     case XENMEM_maximum_ram_page:
         rc = max_page;
         break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 4accbd4..cbc0b9a 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -511,6 +511,39 @@ struct xen_unpin {
 typedef struct xen_unpin xen_unpin_t;
 DEFINE_XEN_GUEST_HANDLE(xen_unpin_t);
 
+/*
+ * XENMEM_pin pins a set of pages to make sure that the hypervisor does
+ * not change the p2m mappings for them.
+ *
+ */
+#define XENMEM_pin               28
+struct xen_pin {
+    /*
+     * [IN/OUT] Details of memory extents to be pinned (GMFN bases).
+     * Xen copies back the MFNs corresponding to the GMFNs passed in as
+     * argument.
+     * @in.address_bits contains the maximum number of bits addressable
+     * by the caller. If the machine addresses of the pages to be pinned
+     * are not addressable according to @in.address_bits, the hypercall
+     * fails and returns an errors. The pages are not pinned. Otherwise
+     * the hypercall succeeds.
+     */
+    struct xen_memory_reservation in;
+
+    /*
+     * [OUT] Number of input extents that were successfully pinned.
+     *  1. The first @nr_pinned input extents were successfully
+     *     pinned.
+     *  2. All other input extents are untouched.
+     *  3. If not all input extents are pinned then the return code of this
+     *     command will be non-zero.
+     */
+    xen_ulong_t nr_pinned;
+};
+typedef struct xen_pin xen_pin_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pin_t);
+
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
-- 
1.7.2.5

Jan Beulich

2013-Sep-30 07:26 UTC

head link

Re: [PATCH v6 1/5] xen: move steal_page and donate_page to common code

>>> On 27.09.13 at 18:15, Stefano Stabellini
<stefano.stabellini@eu.citrix.com>
wrote:> Only code movement, except for a small change to the warning printed in
> case of an error in donate_page.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> CC: keir@xen.org 
> CC: JBeulich@suse.com 
> 
> Changes in v6:
> - insert a missing black;
> - replace gdprintk with dprintk(XENLOG_G_ in donate_page;
> - replace printk with dprintk(XENLOG_G_ in steal_page;
> - remove casting of page_to_mfn to void*, use %lx instead;
> - add "steal" to the steal_page gdprintk;
> - replace 1 with (-PGC_count_mask & PGC_count_mask) in donate_page.
But the respective comment on the previous version obviously
also applied to steal_page(), where there are still literal 1s left.

Also for readability having something like

#define PGC_count_1 (-PGC_count_mask & PGC_count_mask)

might be worthwhile.

With at least the first part of these changes
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan Beulich

2013-Sep-30 07:36 UTC

head link

Re: [PATCH v6 4/5] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin

>>> On 27.09.13 at 18:15, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> Changes in v6:
> - do not change error paths;
> - crash the guest on pinning failure;
Isn''t that a little harsh?
> @@ -615,6 +622,21 @@ static long
memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>              {
>                  for ( k = 0; k < (1UL << exch.out.extent_order);
k++ )
>                      set_gpfn_from_mfn(mfn + k, gpfn + k);
> +            }
> +            if ( op == XENMEM_exchange_and_pin )
> +            {
> +                rc = guest_physmap_pin_range(d, gpfn,
exch.out.extent_order);
> +                if ( rc )
> +                {
> +                    gdprintk(XENLOG_WARNING, "couldn''t pin
gpfn=%"PRIpaddr" order=%u\n",
gpfn isn''t paddr_t, but xen_pfn_t, and hence PRIpaddr is wrong.
> +                            gpfn, exch.out.extent_order);
> +                    rcu_unlock_domain(d);
> +                    domain_crash(d);
Shouldn''t these two be in inverse order?
> +static int unpin(XEN_GUEST_HANDLE_PARAM(xen_unpin_t) arg)
> +{
> +    int rc;
> +    unsigned long i = 0;
> +    struct xen_unpin unpin;
> +    xen_pfn_t gpfn;
> +    struct domain *d = NULL;
> +
> +    if ( copy_from_guest(&unpin, arg, 1) )
> +        return -EFAULT;
> +
> +    /* Various sanity checks. */
> +    if ( /* Extent orders are sensible? */
> +         (unpin.in.extent_order > MAX_ORDER) ||
> +         /* Sizes of input list do not overflow a long? */
Too little modification to the original comments: There''s just one
order and one input list here.
> +         ((~0UL >> unpin.in.extent_order) < unpin.in.nr_extents)
)
> +    {
> +        rc = -EFAULT;
> +        goto fail;
> +    }
> +
> +    if ( !guest_handle_okay(unpin.in.extent_start, unpin.in.nr_extents) )
> +    {
> +        rc = -EFAULT;
> +        goto fail;
> +    }
> +
> +    d = rcu_lock_domain_by_any_id(unpin.in.domid);
> +    if ( d == NULL )
> +    {
> +        rc = -ESRCH;
> +        goto fail;
> +    }
All error paths up to here need to avoid rcu_unlock_domain().

Jan

Jan Beulich

2013-Sep-30 07:41 UTC

head link

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

>>> On 27.09.13 at 18:16, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> +static long pin(XEN_GUEST_HANDLE_PARAM(xen_pin_t) arg)
> +{
> +    int rc;
> +    unsigned long i = 0, j;
> +    struct xen_pin pin;
> +    xen_pfn_t gpfn, mfn;
> +    struct domain *d = NULL;
> +    unsigned int memflags = 0;
> +
> +    if ( copy_from_guest(&pin, arg, 1) )
> +        return -EFAULT;
> +
> +    /* Various sanity checks. */
> +    if ( /* Extent orders are sensible? */
> +         (pin.in.extent_order > MAX_ORDER) ||
> +         /* Sizes of input list do not overflow a long? */
Again, please use singular above.
> +         ((~0UL >> pin.in.extent_order) < pin.in.nr_extents) )
> +    {
> +        rc = -EFAULT;
> +        goto fail;
> +    }
> +
> +    if ( !guest_handle_okay(pin.in.extent_start, pin.in.nr_extents) )
> +    {
> +        rc = -EFAULT;
> +        goto fail;
> +    }
> +
> +    d = rcu_lock_domain_by_any_id(pin.in.domid);
> +    if ( d == NULL )
> +    {
> +        rc = -ESRCH;
> +        goto fail;
> +    }
And again all the above yields rcu_unlock_domain() being called
without the domain having got locked.
> +
> +    if ( !is_hardware_domain(d) )
> +    {
> +        rc = -EPERM;
> +        goto fail;
> +    }
> +
> +    memflags |= MEMF_bits(domain_clamp_alloc_bitsize(
> +        d,
> +        XENMEMF_get_address_bits(pin.in.mem_flags) ? :
> +        (BITS_PER_LONG+PAGE_SHIFT)));
> +
> +    for ( ; i < pin.in.nr_extents; i++ )
> +    {
> +        if ( unlikely(__copy_from_guest_offset(
> +                      &gpfn, pin.in.extent_start, i, 1)) )
> +        {
> +            rc = -EFAULT;
> +            goto fail;
> +        }
> +
> +        if ( generic_fls64(gpfn << PAGE_SHIFT) > memflags )
Didn''t you mean MEMF_bits(memflags) here?
> +        {
> +            rc = -EINVAL;
> +            goto fail;
> +        }
> +
> +        rc = guest_physmap_pin_range(d, gpfn, pin.in.extent_order);
> +        if ( rc )
> +            goto fail;
> +
> +        for ( j = 0; j < (1UL << pin.in.extent_order); j++ )
> +        {
> +            mfn = gmfn_to_mfn(d, gpfn);
I''m afraid you didn''t even build test this on x86:
There''s no
gmfn_to_mfn() there.

Jan

Stefano Stabellini

2013-Sep-30 12:30 UTC

head link

Re: [PATCH v6 4/5] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin

On Mon, 30 Sep 2013, Jan Beulich wrote:> >>> On 27.09.13 at 18:15, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> > Changes in v6:
> > - do not change error paths;
> > - crash the guest on pinning failure;
> 
> Isn''t that a little harsh?
I agree that this is harsh, but it''s quite difficult to undo the
page exchange that was previously done up to the point of failure. And a
pinning error can only be caused by a misbehaviour of the guest.
Overall this is a simple solution to a difficult problem.

I''ll make the changes according to all the comments, thanks.

David Vrabel

2013-Sep-30 12:42 UTC

head link

Re: [PATCH v6 4/5] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin

On 30/09/13 13:30, Stefano Stabellini wrote:> On Mon, 30 Sep 2013, Jan Beulich wrote:
>>>>> On 27.09.13 at 18:15, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
>>> Changes in v6:
>>> - do not change error paths;
>>> - crash the guest on pinning failure;
>>
>> Isn''t that a little harsh?
> 
> I agree that this is harsh, but it''s quite difficult to undo the
> page exchange that was previously done up to the point of failure. And a
> pinning error can only be caused by a misbehaviour of the guest.
> Overall this is a simple solution to a difficult problem.
Do you need to undo the exchange if the pinning failed?  Or could return
an error status of "exchanged but not pinned" and have the guest do
the
right thing?

David

Stefano Stabellini

2013-Sep-30 12:56 UTC

head link

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

On Mon, 30 Sep 2013, Jan Beulich wrote:> > +
> > +    if ( !is_hardware_domain(d) )
> > +    {
> > +        rc = -EPERM;
> > +        goto fail;
> > +    }
> > +
> > +    memflags |= MEMF_bits(domain_clamp_alloc_bitsize(
> > +        d,
> > +        XENMEMF_get_address_bits(pin.in.mem_flags) ? :
> > +        (BITS_PER_LONG+PAGE_SHIFT)));
> > +
> > +    for ( ; i < pin.in.nr_extents; i++ )
> > +    {
> > +        if ( unlikely(__copy_from_guest_offset(
> > +                      &gpfn, pin.in.extent_start, i, 1)) )
> > +        {
> > +            rc = -EFAULT;
> > +            goto fail;
> > +        }
> > +
> > +        if ( generic_fls64(gpfn << PAGE_SHIFT) > memflags )
> 
> Didn''t you mean MEMF_bits(memflags) here?
memflags is set to MEMF_bits(XXX) in the assignment above

> > +        {
> > +            rc = -EINVAL;
> > +            goto fail;
> > +        }
> > +
> > +        rc = guest_physmap_pin_range(d, gpfn, pin.in.extent_order);
> > +        if ( rc )
> > +            goto fail;
> > +
> > +        for ( j = 0; j < (1UL << pin.in.extent_order); j++ )
> > +        {
> > +            mfn = gmfn_to_mfn(d, gpfn);
> 
> I''m afraid you didn''t even build test this on x86:
There''s no
> gmfn_to_mfn() there.
Argh. On x86 I''ll call get_gfn_unshare and ASSERT that the page is not
shared (guest_physmap_pin_range is not implemented on x86 but it should
have definitely unshared the page or failed doing so).

Ian Campbell

2013-Sep-30 13:02 UTC

head link

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

On Mon, 2013-09-30 at 13:56 +0100, Stefano Stabellini
wrote:> On Mon, 30 Sep 2013, Jan Beulich wrote:
> > > +
> > > +    if ( !is_hardware_domain(d) )
> > > +    {
> > > +        rc = -EPERM;
> > > +        goto fail;
> > > +    }
> > > +
> > > +    memflags |= MEMF_bits(domain_clamp_alloc_bitsize(
> > > +        d,
> > > +        XENMEMF_get_address_bits(pin.in.mem_flags) ? :
> > > +        (BITS_PER_LONG+PAGE_SHIFT)));
> > > +
> > > +    for ( ; i < pin.in.nr_extents; i++ )
> > > +    {
> > > +        if ( unlikely(__copy_from_guest_offset(
> > > +                      &gpfn, pin.in.extent_start, i, 1)) )
> > > +        {
> > > +            rc = -EFAULT;
> > > +            goto fail;
> > > +        }
> > > +
> > > +        if ( generic_fls64(gpfn << PAGE_SHIFT) >
memflags )
> > 
> > Didn''t you mean MEMF_bits(memflags) here?
> 
> memflags is set to MEMF_bits(XXX) in the assignment above
memflags is quite a confusing name for such a variable and/or implicitly
relying on this internal knowledge of the flags layout later is overly
subtle.

If you don''t want to use the accessor as Jan suggested perhaps membits
or just bits would be a better variable name, omitting the MEMF_bits
"wrapper" until the actual callsite which needs it or keep memflags
separately.

Ian.

Tim Deegan

2013-Sep-30 13:03 UTC

head link

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

At 13:56 +0100 on 30 Sep (1380549398), Stefano Stabellini
wrote:> On Mon, 30 Sep 2013, Jan Beulich wrote:
> > > +
> > > +    if ( !is_hardware_domain(d) )
> > > +    {
> > > +        rc = -EPERM;
> > > +        goto fail;
> > > +    }
> > > +
> > > +    memflags |= MEMF_bits(domain_clamp_alloc_bitsize(
> > > +        d,
> > > +        XENMEMF_get_address_bits(pin.in.mem_flags) ? :
> > > +        (BITS_PER_LONG+PAGE_SHIFT)));
> > > +
> > > +    for ( ; i < pin.in.nr_extents; i++ )
> > > +    {
> > > +        if ( unlikely(__copy_from_guest_offset(
> > > +                      &gpfn, pin.in.extent_start, i, 1)) )
> > > +        {
> > > +            rc = -EFAULT;
> > > +            goto fail;
> > > +        }
> > > +
> > > +        if ( generic_fls64(gpfn << PAGE_SHIFT) >
memflags )
> > 
> > Didn''t you mean MEMF_bits(memflags) here?
> 
> memflags is set to MEMF_bits(XXX) in the assignment above
> 
> 
> > > +        {
> > > +            rc = -EINVAL;
> > > +            goto fail;
> > > +        }
> > > +
> > > +        rc = guest_physmap_pin_range(d, gpfn,
pin.in.extent_order);
> > > +        if ( rc )
> > > +            goto fail;
> > > +
> > > +        for ( j = 0; j < (1UL << pin.in.extent_order);
j++ )
> > > +        {
> > > +            mfn = gmfn_to_mfn(d, gpfn);
> > 
> > I''m afraid you didn''t even build test this on x86:
There''s no
> > gmfn_to_mfn() there.
> 
> Argh. On x86 I''ll call get_gfn_unshare and ASSERT that the page is
not
> shared (guest_physmap_pin_range is not implemented on x86 but it should
> have definitely unshared the page or failed doing so).
On x86 can you not just ENOSYS the whole thing?

Tim.

Jan Beulich

2013-Sep-30 13:14 UTC

head link

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

>>> On 30.09.13 at 14:56, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Mon, 30 Sep 2013, Jan Beulich wrote:
>> > +
>> > +    if ( !is_hardware_domain(d) )
>> > +    {
>> > +        rc = -EPERM;
>> > +        goto fail;
>> > +    }
>> > +
>> > +    memflags |= MEMF_bits(domain_clamp_alloc_bitsize(
>> > +        d,
>> > +        XENMEMF_get_address_bits(pin.in.mem_flags) ? :
>> > +        (BITS_PER_LONG+PAGE_SHIFT)));
>> > +
>> > +    for ( ; i < pin.in.nr_extents; i++ )
>> > +    {
>> > +        if ( unlikely(__copy_from_guest_offset(
>> > +                      &gpfn, pin.in.extent_start, i, 1)) )
>> > +        {
>> > +            rc = -EFAULT;
>> > +            goto fail;
>> > +        }
>> > +
>> > +        if ( generic_fls64(gpfn << PAGE_SHIFT) >
memflags )
>> 
>> Didn''t you mean MEMF_bits(memflags) here?
> 
> memflags is set to MEMF_bits(XXX) in the assignment above
No, the bit width gets or-ed into memflags - that''s no the same.

Jan

Stefano Stabellini

2013-Sep-30 13:53 UTC

head link

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

On Mon, 30 Sep 2013, Tim Deegan wrote:> > > > +        {
> > > > +            rc = -EINVAL;
> > > > +            goto fail;
> > > > +        }
> > > > +
> > > > +        rc = guest_physmap_pin_range(d, gpfn,
pin.in.extent_order);
> > > > +        if ( rc )
> > > > +            goto fail;
> > > > +
> > > > +        for ( j = 0; j < (1UL <<
pin.in.extent_order); j++ )
> > > > +        {
> > > > +            mfn = gmfn_to_mfn(d, gpfn);
> > > 
> > > I''m afraid you didn''t even build test this on
x86: There''s no
> > > gmfn_to_mfn() there.
> > 
> > Argh. On x86 I''ll call get_gfn_unshare and ASSERT that the
page is not
> > shared (guest_physmap_pin_range is not implemented on x86 but it
should
> > have definitely unshared the page or failed doing so).
> 
> On x86 can you not just ENOSYS the whole thing? 
Yeah, I guess it makes sense, but it requires an #ifdef in common code.

Stefano Stabellini

2013-Sep-30 14:31 UTC

head link

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

On Mon, 30 Sep 2013, Jan Beulich wrote:> >>> On 30.09.13 at 14:56, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> > On Mon, 30 Sep 2013, Jan Beulich wrote:
> >> > +
> >> > +    if ( !is_hardware_domain(d) )
> >> > +    {
> >> > +        rc = -EPERM;
> >> > +        goto fail;
> >> > +    }
> >> > +
> >> > +    memflags |= MEMF_bits(domain_clamp_alloc_bitsize(
> >> > +        d,
> >> > +        XENMEMF_get_address_bits(pin.in.mem_flags) ? :
> >> > +        (BITS_PER_LONG+PAGE_SHIFT)));
> >> > +
> >> > +    for ( ; i < pin.in.nr_extents; i++ )
> >> > +    {
> >> > +        if ( unlikely(__copy_from_guest_offset(
> >> > +                      &gpfn, pin.in.extent_start, i, 1))
)
> >> > +        {
> >> > +            rc = -EFAULT;
> >> > +            goto fail;
> >> > +        }
> >> > +
> >> > +        if ( generic_fls64(gpfn << PAGE_SHIFT) >
memflags )
> >> 
> >> Didn''t you mean MEMF_bits(memflags) here?
> > 
> > memflags is set to MEMF_bits(XXX) in the assignment above
> 
> No, the bit width gets or-ed into memflags - that''s no the same.
The usage of MEMF_bits is indeed wrong: MEMF_bits(n) is
((n)<<_MEMF_bits). Given that XENMEMF_get_address_bits(pin.in.mem_flags)
is the most significant bit allowed in the page address, I think we only
need:

    memflags = domain_clamp_alloc_bitsize(d,
    XENMEMF_get_address_bits(pin.in.mem_flags) ? :
    (BITS_PER_LONG+PAGE_SHIFT));
    
    [...]
    
    if ( generic_fls64(gpfn << PAGE_SHIFT) > memflags )

Jan Beulich

2013-Sep-30 15:27 UTC

head link

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

>>> On 30.09.13 at 16:31, Stefano Stabellini
<stefano.stabellini@eu.citrix.com>
wrote:> On Mon, 30 Sep 2013, Jan Beulich wrote:
>> >>> On 30.09.13 at 14:56, Stefano Stabellini
<stefano.stabellini@eu.citrix.com>
> wrote:
>> > On Mon, 30 Sep 2013, Jan Beulich wrote:
>> >> > +
>> >> > +    if ( !is_hardware_domain(d) )
>> >> > +    {
>> >> > +        rc = -EPERM;
>> >> > +        goto fail;
>> >> > +    }
>> >> > +
>> >> > +    memflags |= MEMF_bits(domain_clamp_alloc_bitsize(
>> >> > +        d,
>> >> > +        XENMEMF_get_address_bits(pin.in.mem_flags) ? :
>> >> > +        (BITS_PER_LONG+PAGE_SHIFT)));
>> >> > +
>> >> > +    for ( ; i < pin.in.nr_extents; i++ )
>> >> > +    {
>> >> > +        if ( unlikely(__copy_from_guest_offset(
>> >> > +                      &gpfn, pin.in.extent_start, i,
1)) )
>> >> > +        {
>> >> > +            rc = -EFAULT;
>> >> > +            goto fail;
>> >> > +        }
>> >> > +
>> >> > +        if ( generic_fls64(gpfn << PAGE_SHIFT)
> memflags )
>> >> 
>> >> Didn''t you mean MEMF_bits(memflags) here?
>> > 
>> > memflags is set to MEMF_bits(XXX) in the assignment above
>> 
>> No, the bit width gets or-ed into memflags - that''s no the
same.
> 
> The usage of MEMF_bits is indeed wrong: MEMF_bits(n) is
> ((n)<<_MEMF_bits). Given that
XENMEMF_get_address_bits(pin.in.mem_flags)
> is the most significant bit allowed in the page address, I think we only
> need:
> 
>     memflags = domain_clamp_alloc_bitsize(d,
>     XENMEMF_get_address_bits(pin.in.mem_flags) ? :
>     (BITS_PER_LONG+PAGE_SHIFT));
>     
>     [...]
>     
>     if ( generic_fls64(gpfn << PAGE_SHIFT) > memflags )
Perhaps, but then - as was recommended already - don#t call
the variable "memflags".

Jan

Xen devel - Sep 2013 - [PATCH v6 0/5] introduce XENMEM_exchange_and_pin, XENMEM_unpin and XENMEM_pin

[PATCH v6 0/5] introduce XENMEM_exchange_and_pin, XENMEM_unpin and XENMEM_pin

[PATCH v6 1/5] xen: move steal_page and donate_page to common code

[PATCH v6 2/5] xen/arm: introduce a generic p2m walker and use it in p2m_lookup

[PATCH v6 3/5] xen: implement guest_physmap_pin_range and guest_physmap_unpin_range

[PATCH v6 4/5] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin

[PATCH v6 5/5] xen: introduce XENMEM_pin

Re: [PATCH v6 1/5] xen: move steal_page and donate_page to common code

Re: [PATCH v6 4/5] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

Re: [PATCH v6 4/5] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin

Re: [PATCH v6 4/5] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin

Re: [PATCH v6 5/5] xen: introduce XENMEM_pin