thr3ads.net - Xen devel - [PATCH 0 of 4] Handling of (some) low memory conditions [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Andres Lagar-Cavilla

2012-Feb-16 03:57 UTC

[PATCH 0 of 4] Handling of (some) low memory conditions

After some experiments with the sharing code under low memory conditions, we
post the following series:
- Bugfix sharing unshare when we run out of memory.
- Sort out the situations in which we can go to sleep on a wait queue
 if unshare fails (and by extension the semantics of unshare error
 handling)
- Prevent a certain Ocaml-based toolstack from crashing domains doing sharing
 or paging.
- Add a VIRQ that the hypervisor can emit when reaching a low memory threshold.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Signed-off-by: Adin Scannell <adin@scannell.ca>

 xen/common/domctl.c               |   8 +++++-
 xen/arch/x86/mm/mem_event.c       |   5 ++-
 xen/include/asm-x86/mem_event.h   |  30 ++++++++++++++++++---
 xen/arch/x86/hvm/hvm.c            |  20 +++++++++++++-
 xen/arch/x86/mm.c                 |   8 +++--
 xen/arch/x86/mm/mem_sharing.c     |  52 ++++++++++++++++++++++++---------------
 xen/arch/x86/mm/p2m.c             |  18 ++++++++++++-
 xen/common/grant_table.c          |  11 ++++---
 xen/common/memory.c               |   1 +
 xen/include/asm-x86/mem_sharing.h |  15 +++++++++++
 xen/common/page_alloc.c           |  10 +++++++
 xen/include/public/xen.h          |   1 +
 12 files changed, 140 insertions(+), 39 deletions(-)

Andres Lagar-Cavilla

2012-Feb-16 03:57 UTC

head link

[PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

xen/common/domctl.c |  8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)


Apparently, setting d->max_pages to something lower than d->tot_pages is
used as a mechanism for controling a domain''s footprint. It will result
in all new page allocations failing.

This is a really bad idea with paging or sharing, as regular guest memory
accesses may need to be satisfied by allocating new memory (either to
page in or to unshare).

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 62b1fe67b8d1 -r 11fd4e0a1e1a xen/common/domctl.c
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -813,8 +813,14 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc
          * NB. We removed a check that new_max >= current tot_pages; this
means
          * that the domain will now be allowed to "ratchet" down to
new_max. In
          * the meantime, while tot > max, all new allocations are
disallowed.
+         *
+         * Except that this is not a great idea for domains doing sharing or 
+         * paging, as they need to perform new allocations on the fly.
          */
-        d->max_pages = new_max;
+        if ( (new_max > d->max_pages) ||
+             !((d->mem_event->paging.ring_page != NULL) ||
+                d->arch.hvm_domain.mem_sharing_enabled) )
+            d->max_pages = new_max;
         ret = 0;
         spin_unlock(&d->page_alloc_lock);

Andres Lagar-Cavilla

2012-Feb-16 03:57 UTC

head link

[PATCH 2 of 4] x86/mm: Allow to not sleep on mem event ring

xen/arch/x86/mm/mem_event.c     |   5 +++--
 xen/include/asm-x86/mem_event.h |  30 +++++++++++++++++++++++++-----
 2 files changed, 28 insertions(+), 7 deletions(-)


Under extreme congestion conditions, generating a mem event may put the vcpu to
sleep on a wait queue if the ring is full. This is generally desirable, although
fairly convoluted to work with, since sleeping on a wait queue requires a
non-atomic context (i.e. no locks held).

Introduce an allow_sleep flag to make this optional. The API remains such that
all current callers set allow_sleep to true and thus will sleep if necessary.

The end-use is for cases in which loss of guest mem events is tolerable. One
such consumer to be added later is the unsharing code under ENOMEM conditions.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Signed-off-by: Adin Scannell <adin@scannell.ca>

diff -r 11fd4e0a1e1a -r 09746decbd28 xen/arch/x86/mm/mem_event.c
--- a/xen/arch/x86/mm/mem_event.c
+++ b/xen/arch/x86/mm/mem_event.c
@@ -441,9 +441,10 @@ bool_t mem_event_check_ring(struct mem_e
  *               0: a spot has been reserved
  *
  */
-int mem_event_claim_slot(struct domain *d, struct mem_event_domain *med)
+int __mem_event_claim_slot(struct domain *d, struct mem_event_domain *med,
+                            bool_t allow_sleep)
 {
-    if ( current->domain == d )
+    if ( (current->domain == d) && allow_sleep )
         return mem_event_wait_slot(med);
     else
         return mem_event_grab_slot(med, 1);
diff -r 11fd4e0a1e1a -r 09746decbd28 xen/include/asm-x86/mem_event.h
--- a/xen/include/asm-x86/mem_event.h
+++ b/xen/include/asm-x86/mem_event.h
@@ -28,11 +28,31 @@
 bool_t mem_event_check_ring(struct mem_event_domain *med);
 
 /* Returns 0 on success, -ENOSYS if there is no ring, -EBUSY if there is no
- * available space. For success or -EBUSY, the vCPU may be left blocked
- * temporarily to ensure that the ring does not lose future events.  In
- * general, you must follow a claim_slot() call with either put_request() or
- * cancel_slot(), both of which are guaranteed to succeed. */
-int mem_event_claim_slot(struct domain *d, struct mem_event_domain *med);
+ * available space and the caller is a foreign domain. If the guest itself
+ * is the caller, -EBUSY is avoided by sleeping on a wait queue to ensure
+ * that the ring does not lose future events. 
+ *
+ * However, the allow_sleep flag can be set to false in cases in which it is ok
+ * to lose future events, and thus -EBUSY can be returned to guest vcpus
+ * (handle with care!). 
+ *
+ * In general, you must follow a claim_slot() call with either put_request() or
+ * cancel_slot(), both of which are guaranteed to
+ * succeed. 
+ */
+int __mem_event_claim_slot(struct domain *d, struct mem_event_domain *med,
+                            bool_t allow_sleep);
+static inline int mem_event_claim_slot(struct domain *d, 
+                                        struct mem_event_domain *med)
+{
+    return __mem_event_claim_slot(d, med, 1);
+}
+
+static inline int mem_event_claim_slot_nosleep(struct domain *d,
+                                        struct mem_event_domain *med)
+{
+    return __mem_event_claim_slot(d, med, 0);
+}
 
 void mem_event_cancel_slot(struct domain *d, struct mem_event_domain *med);

Andres Lagar-Cavilla

2012-Feb-16 03:57 UTC

head link

[PATCH 3 of 4] Memory sharing: better handling of ENOMEM while unsharing

xen/arch/x86/hvm/hvm.c            |  20 +++++++++++++-
 xen/arch/x86/mm.c                 |   8 +++--
 xen/arch/x86/mm/mem_sharing.c     |  52 ++++++++++++++++++++++++---------------
 xen/arch/x86/mm/p2m.c             |  18 ++++++++++++-
 xen/common/grant_table.c          |  11 ++++---
 xen/common/memory.c               |   1 +
 xen/include/asm-x86/mem_sharing.h |  15 +++++++++++
 7 files changed, 94 insertions(+), 31 deletions(-)


If unsharing fails with ENOMEM, we were:
 - leaving the list of gfns backed by the shared page in an inconsistent state
 - cycling forever on the hap page fault handler.
 - Attempting to produce a mem event (which could sleep on a wait queue)
   while holding locks.
 - Not checking, for all callers, that unshare could have indeed failed.

Fix bugs above, and sanitize callers to place a ring event in an unlocked
context, or without requiring to go to sleep on a wait queue.

A note on the rationale for unshare error handling:
 1. Unshare can only fail with ENOMEM. Any other error conditions
BUG_ON()''s
 2. We notify a potential dom0 helper through a mem_event ring. But we
    allow the notification to not go to sleep. If the event ring is full
    of ENOMEM warnings, then the helper will already have been kicked enough.
 3. We cannot "just" go to sleep until the unshare is resolved,
because we
    might be buried deep into locks (e.g. something -> copy_to_user ->
    __hvm_copy)
 4. So, we make sure we:
    4.1. return an error
    4.2. do not corrupt shared memory
    4.3. do not corrupt guest memory
    4.4. let the guest deal with the error if propagation will reach that far

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 09746decbd28 -r 407b5ac709aa xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1196,7 +1196,7 @@ int hvm_hap_nested_page_fault(unsigned l
     mfn_t mfn;
     struct vcpu *v = current;
     struct p2m_domain *p2m;
-    int rc, fall_through = 0, paged = 0;
+    int rc, fall_through = 0, paged = 0, sharing_enomem = 0;
     mem_event_request_t *req_ptr = NULL;
 
     /* On Nested Virtualization, walk the guest page table.
@@ -1305,7 +1305,8 @@ int hvm_hap_nested_page_fault(unsigned l
     if ( access_w && (p2mt == p2m_ram_shared) )
     {
         ASSERT(!p2m_is_nestedp2m(p2m));
-        mem_sharing_unshare_page(p2m->domain, gfn, 0);
+        sharing_enomem = 
+            (mem_sharing_unshare_page(p2m->domain, gfn, 0) < 0);
         rc = 1;
         goto out_put_gfn;
     }
@@ -1345,8 +1346,23 @@ int hvm_hap_nested_page_fault(unsigned l
 
 out_put_gfn:
     put_gfn(p2m->domain, gfn);
+    /* All of these are delayed until we exit, since we might 
+     * sleep on event ring wait queues, and we must not hold
+     * locks in such circumstance */
     if ( paged )
         p2m_mem_paging_populate(v->domain, gfn);
+    if ( sharing_enomem )
+    {
+        int rv;
+        if ( (rv = mem_sharing_notify_enomem(v->domain, gfn, 1)) < 0 )
+        {
+            gdprintk(XENLOG_ERR, "Domain %hu attempt to unshare "
+                     "gfn %lx, ENOMEM and no helper (rc %d)\n",
+                        v->domain->domain_id, gfn, rv);
+            /* Crash the domain */
+            rc = 0;
+        }
+    }
     if ( req_ptr )
     {
         mem_access_send_req(v->domain, req_ptr);
diff -r 09746decbd28 -r 407b5ac709aa xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3589,12 +3589,14 @@ int do_mmu_update(
                         /* Unshare the page for RW foreign mappings */
                         if ( l1e_get_flags(l1e) & _PAGE_RW )
                         {
-                            rc = mem_sharing_unshare_page(pg_owner, 
-                                                          l1e_get_pfn(l1e), 
-                                                          0);
+                            unsigned long gfn = l1e_get_pfn(l1e);
+                            rc = mem_sharing_unshare_page(pg_owner, gfn, 0); 
                             if ( rc )
                             {
                                 put_gfn(pg_owner, l1egfn);
+                                /* Notify helper, don''t care about
errors, will not
+                                 * sleep on wq, since we''re a foreign
domain. */
+                                (void)mem_sharing_notify_enomem(pg_owner, gfn,
0);
                                 break; 
                             }
                         }
diff -r 09746decbd28 -r 407b5ac709aa xen/arch/x86/mm/mem_sharing.c
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -343,35 +343,30 @@ int mem_sharing_audit(void)
 #endif
 
 
-static void mem_sharing_notify_helper(struct domain *d, unsigned long gfn)
+int mem_sharing_notify_enomem(struct domain *d, unsigned long gfn,
+                                bool_t allow_sleep) 
 {
     struct vcpu *v = current;
+    int rc;
     mem_event_request_t req = { .type = MEM_EVENT_TYPE_SHARED };
 
-    if ( v->domain != d )
+    if ( (rc = __mem_event_claim_slot(d, 
+                        &d->mem_event->share, allow_sleep)) < 0 )
+        return rc;
+
+    if ( v->domain == d )
     {
-        /* XXX This path needs some attention.  For now, just fail foreign 
-         * XXX requests to unshare if there''s no memory.  This
replaces
-         * XXX old code that BUG()ed here; the callers now BUG()
-         * XXX elewhere. */
-        gdprintk(XENLOG_ERR, 
-                 "Failed alloc on unshare path for foreign (%d)
lookup\n",
-                 d->domain_id);
-        return;
+        req.flags = MEM_EVENT_FLAG_VCPU_PAUSED;
+        vcpu_pause_nosync(v);
     }
 
-    if (mem_event_claim_slot(d, &d->mem_event->share) < 0)
-    {
-        return;
-    }
-
-    req.flags = MEM_EVENT_FLAG_VCPU_PAUSED;
-    vcpu_pause_nosync(v);
-
     req.gfn = gfn;
     req.p2mt = p2m_ram_shared;
     req.vcpu_id = v->vcpu_id;
+
     mem_event_put_request(d, &d->mem_event->share, &req);
+
+    return 0;
 }
 
 unsigned int mem_sharing_get_nr_saved_mfns(void)
@@ -903,6 +898,20 @@ err_out:
     return ret;
 }
 
+
+/* A note on the rationale for unshare error handling:
+ *  1. Unshare can only fail with ENOMEM. Any other error conditions
BUG_ON()''s
+ *  2. We notify a potential dom0 helper through a mem_event ring. But we
+ *     allow the notification to not go to sleep. If the event ring is full 
+ *     of ENOMEM warnings, then it''s on the ball.
+ *  3. We cannot go to sleep until the unshare is resolved, because we might
+ *     be buried deep into locks (e.g. something -> copy_to_user ->
__hvm_copy)
+ *  4. So, we make sure we:
+ *     4.1. return an error
+ *     4.2. do not corrupt shared memory
+ *     4.3. do not corrupt guest memory
+ *     4.4. let the guest deal with it if the error propagation will reach it
+ */
 int mem_sharing_unshare_page(struct domain *d,
                              unsigned long gfn, 
                              uint16_t flags)
@@ -945,7 +954,6 @@ gfn_found:
     /* Do the accounting first. If anything fails below, we have bigger
      * bigger fish to fry. First, remove the gfn from the list. */ 
     last_gfn = list_has_one_entry(&page->sharing->gfns);
-    mem_sharing_gfn_destroy(d, gfn_info);
     if ( last_gfn )
     {
         /* Clean up shared state */
@@ -959,6 +967,7 @@ gfn_found:
      * (possibly freeing the page), and exit early */
     if ( flags & MEM_SHARING_DESTROY_GFN )
     {
+        mem_sharing_gfn_destroy(d, gfn_info);
         put_page_and_type(page);
         mem_sharing_page_unlock(page);
         if ( last_gfn && 
@@ -971,6 +980,7 @@ gfn_found:
  
     if ( last_gfn )
     {
+        mem_sharing_gfn_destroy(d, gfn_info);
         /* Making a page private atomically unlocks it */
         BUG_ON(page_make_private(d, page) != 0);
         goto private_page_found;
@@ -982,7 +992,8 @@ gfn_found:
     {
         mem_sharing_page_unlock(old_page);
         put_gfn(d, gfn);
-        mem_sharing_notify_helper(d, gfn);
+        /* Caller is responsible for placing an event
+         * in the ring */
         return -ENOMEM;
     }
 
@@ -993,6 +1004,7 @@ gfn_found:
     unmap_domain_page(t);
 
     BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page)) == 0);
+    mem_sharing_gfn_destroy(d, gfn_info);
     mem_sharing_page_unlock(old_page);
     put_page_and_type(old_page);
 
diff -r 09746decbd28 -r 407b5ac709aa xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -170,7 +170,10 @@ mfn_t __get_gfn_type_access(struct p2m_d
     if ( q == p2m_unshare && p2m_is_shared(*t) )
     {
         ASSERT(!p2m_is_nestedp2m(p2m));
-        mem_sharing_unshare_page(p2m->domain, gfn, 0);
+        /* Try to unshare. If we fail, communicate ENOMEM without
+         * sleeping. */
+        if ( mem_sharing_unshare_page(p2m->domain, gfn, 0) < 0 )
+            (void)mem_sharing_notify_enomem(p2m->domain, gfn, 0);
         mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order);
     }
 #endif
@@ -371,6 +374,7 @@ void p2m_teardown(struct p2m_domain *p2m
         if ( mfn_valid(mfn) && (t == p2m_ram_shared) )
         {
             ASSERT(!p2m_is_nestedp2m(p2m));
+            /* Does not fail with ENOMEM given the DESTROY flag */
             BUG_ON(mem_sharing_unshare_page(d, gfn, MEM_SHARING_DESTROY_GFN));
         }
     }
@@ -510,6 +514,18 @@ guest_physmap_add_entry(struct domain *d
             if ( rc )
             {
                 p2m_unlock(p2m);
+                /* NOTE: Should a guest domain bring this upon itself,
+                 * there is not a whole lot we can do. We are buried
+                 * deep in locks from most code paths by now. So, fail
+                 * the call and don''t try to sleep on a wait queue
+                 * while placing the mem event.
+                 *
+                 * However, all current (changeset 3432abcf9380) code
+                 * paths avoid this unsavoury situation. For now.
+                 *
+                 * Foreign domains are okay to place an event as they 
+                 * won''t go to sleep. */
+                (void)mem_sharing_notify_enomem(p2m->domain, gfn + i, 0);
                 return rc;
             }
             omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, p2m_query,
NULL);
diff -r 09746decbd28 -r 407b5ac709aa xen/common/grant_table.c
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -112,8 +112,7 @@ static unsigned inline int max_nr_maptra
     p2m_type_t __p2mt;                                      \
     unsigned long __x;                                      \
     __x = mfn_x(get_gfn_unshare((_d), (_gfn), &__p2mt));    \
-    BUG_ON(p2m_is_shared(__p2mt)); /* XXX fixme */          \
-    if ( !p2m_is_valid(__p2mt) )                            \
+    if ( p2m_is_shared(__p2mt) || !p2m_is_valid(__p2mt) )   \
         __x = INVALID_MFN;                                  \
     __x; })
 #else
@@ -155,9 +154,11 @@ static int __get_paged_frame(unsigned lo
     else
     {
         mfn = get_gfn_unshare(rd, gfn, &p2mt);
-        BUG_ON(p2m_is_shared(p2mt));
-        /* XXX Here, and above in gfn_to_mfn_private, need to handle
-         * XXX failure to unshare. */
+        if ( p2m_is_shared(p2mt) )
+        {
+            put_gfn(rd, gfn);
+            return GNTST_eagain;
+        }
     }
 
     if ( p2m_is_valid(p2mt) ) {
diff -r 09746decbd28 -r 407b5ac709aa xen/common/memory.c
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -200,6 +200,7 @@ int guest_remove_page(struct domain *d, 
         if ( mem_sharing_unshare_page(d, gmfn, 0) )
         {
             put_gfn(d, gmfn);
+            (void)mem_sharing_notify_enomem(d, gmfn, 0);
             return 0;
         }
         /* Maybe the mfn changed */
diff -r 09746decbd28 -r 407b5ac709aa xen/include/asm-x86/mem_sharing.h
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -53,9 +53,24 @@ int mem_sharing_nominate_page(struct dom
                               int expected_refcnt,
                               shr_handle_t *phandle);
 #define MEM_SHARING_DESTROY_GFN       (1<<1)
+/* Only fails with -ENOMEM */
 int mem_sharing_unshare_page(struct domain *d, 
                              unsigned long gfn, 
                              uint16_t flags);
+/* If called by a foreign domain, possible errors are
+ *   -EBUSY -> ring full
+ *   -ENOSYS -> no ring to begin with
+ * and the foreign mapper is responsible for retrying.
+ *
+ * If called by the guest vcpu itself and allow_sleep is set, may 
+ * sleep on a wait queue, so the caller is responsible for not 
+ * holding locks on entry. It may only fail with ENOSYS 
+ *
+ * If called by the guest vcpu itself and allow_sleep is not set,
+ * then it''s the same as a foreign domain.
+ */
+int mem_sharing_notify_enomem(struct domain *d, unsigned long gfn,
+                                bool_t allow_sleep);
 int mem_sharing_sharing_resume(struct domain *d);
 int mem_sharing_memop(struct domain *d, 
                        xen_mem_sharing_op_t *mec);

Andres Lagar-Cavilla

2012-Feb-16 03:57 UTC

head link

[PATCH 4 of 4] Global virq for low memory situations

xen/common/page_alloc.c  |  10 ++++++++++
 xen/include/public/xen.h |   1 +
 2 files changed, 11 insertions(+), 0 deletions(-)


When a low memory threshold on the Xen heap is reached, we fire a global dom0
virq. If someone''s listening, they can free up some more memory. The
low
threshold is configurable via the command line token
''low_mem_virq_limit", and
defaults to 64MiB. We define a new virq VIRQ_ENOMEM. Potential listeners
include squeezed, xenballoond, or anything else that can be fired through
xencommons.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 407b5ac709aa -r 65c15f3c9611 xen/common/page_alloc.c
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -35,6 +35,7 @@
 #include <xen/perfc.h>
 #include <xen/numa.h>
 #include <xen/nodemask.h>
+#include <xen/event.h>
 #include <xen/tmem.h>
 #include <xen/tmem_xen.h>
 #include <public/sysctl.h>
@@ -300,6 +301,12 @@ static unsigned long init_node_heap(int 
     return needed;
 }
 
+/* Default to 64 MiB */
+#define DEFAULT_LOW_MEM_VIRQ_MIB    64
+static unsigned long long __read_mostly opt_low_mem_virq = 
+                                        (DEFAULT_LOW_MEM_VIRQ_MIB << 20);
+size_param("low_mem_virq_limit", opt_low_mem_virq);
+
 /* Allocate 2^@order contiguous pages. */
 static struct page_info *alloc_heap_pages(
     unsigned int zone_lo, unsigned int zone_hi,
@@ -420,6 +427,9 @@ static struct page_info *alloc_heap_page
     total_avail_pages -= request;
     ASSERT(total_avail_pages >= 0);
 
+    if ( (total_avail_pages << PAGE_SHIFT) <= opt_low_mem_virq )
+        send_global_virq(VIRQ_ENOMEM);
+
     if ( d != NULL )
         d->last_alloc_node = node;
 
diff -r 407b5ac709aa -r 65c15f3c9611 xen/include/public/xen.h
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -157,6 +157,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_pfn_t);
 #define VIRQ_PCPU_STATE 9  /* G. (DOM0) PCPU state changed                   */
 #define VIRQ_MEM_EVENT  10 /* G. (DOM0) A memory event has occured           */
 #define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient                     */
+#define VIRQ_ENOMEM     12 /* G. (DOM0) Dangerously low on heap memory       */
 
 /* Architecture-specific VIRQ definitions. */
 #define VIRQ_ARCH_0    16

Jan Beulich

2012-Feb-16 09:16 UTC

head link

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

>>> On 16.02.12 at 04:57, Andres Lagar-Cavilla
<andres@lagarcavilla.org> wrote:
> xen/common/domctl.c |  8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> 
> Apparently, setting d->max_pages to something lower than d->tot_pages
is
> used as a mechanism for controling a domain''s footprint. It will
result
> in all new page allocations failing.
> 
> This is a really bad idea with paging or sharing, as regular guest memory
> accesses may need to be satisfied by allocating new memory (either to
> page in or to unshare).
> 
> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> 
> diff -r 62b1fe67b8d1 -r 11fd4e0a1e1a xen/common/domctl.c
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -813,8 +813,14 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc
>           * NB. We removed a check that new_max >= current tot_pages;
this means
>           * that the domain will now be allowed to "ratchet" down
to new_max. In
>           * the meantime, while tot > max, all new allocations are
disallowed.
> +         *
> +         * Except that this is not a great idea for domains doing sharing
or
> +         * paging, as they need to perform new allocations on the fly.
>           */
> -        d->max_pages = new_max;
> +        if ( (new_max > d->max_pages) ||
> +             !((d->mem_event->paging.ring_page != NULL) ||
> +                d->arch.hvm_domain.mem_sharing_enabled) )
> +            d->max_pages = new_max;
I don''t think this is appropriate here, as it will allow not only
paging/sharing allocations to succeed, but also such that aren''t
intended to. I think that *if* an override to the limit enforcement
is needed, it ought to be implemented by passing a special flag
to the page allocator to have it make an exception. But that''s
questionable in the first place - if a domain is told to shrink, it
ought to follow that directive (and the paging/sharing code then
probably should obtain a page from inside the guest; I realize
that might be problematic when sharing is enabled but paging
isn''t).

Jan
>          ret = 0;
>          spin_unlock(&d->page_alloc_lock);
>  
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com 
> http://lists.xensource.com/xen-devel

Jan Beulich

2012-Feb-16 09:31 UTC

head link

Re: [PATCH 0 of 4] Handling of (some) low memory conditions

>>> On 16.02.12 at 04:57, Andres Lagar-Cavilla
<andres@lagarcavilla.org> wrote:
> - Add a VIRQ that the hypervisor can emit when reaching a low memory 
> threshold.
In this patch, which didn''t make it to my inbox yet, you will want to
change this

+    if ( (total_avail_pages << PAGE_SHIFT) <= opt_low_mem_virq )

to

    if ( total_avail_pages <= PFN_DOWN(opt_low_mem_virq) )

to avoid the case (on 32-bit hypervisors) where total_avail_pages,
being just ''long'', would get significant bits shifted out.

I''m further wondering whether the default value shouldn''t be
set
dynamically based on available memory and/or taking into account
an eventual dom0_mem= option.

Finally, rate limiting will be needed for sending the vIRQ - currently,
once below the threshold you''re sending one on each and every
allocation (which may become harmful if the listener can''t really do
anything about the situation).

Jan

Tim Deegan

2012-Feb-16 10:20 UTC

head link

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

At 22:57 -0500 on 15 Feb (1329346624), Andres Lagar-Cavilla
wrote:>  xen/common/domctl.c |  8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> 
> Apparently, setting d->max_pages to something lower than d->tot_pages
is
> used as a mechanism for controling a domain''s footprint. It will
result
> in all new page allocations failing.
Yep.  
> This is a really bad idea with paging or sharing, as regular guest memory
> accesses may need to be satisfied by allocating new memory (either to
> page in or to unshare).
Nack.  If a domain ends up with a max_pages so low that it can''t page
in, that''s a tools bug.  This patch doesn''t fix it, because
the
toolstack could set new max == current tot (er, +1) and then you have
the same problem if you page in twice.   (And also it silently ignores
the update rather than reporting an error.)

Tim.
> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> 
> diff -r 62b1fe67b8d1 -r 11fd4e0a1e1a xen/common/domctl.c
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -813,8 +813,14 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc
>           * NB. We removed a check that new_max >= current tot_pages;
this means
>           * that the domain will now be allowed to "ratchet" down
to new_max. In
>           * the meantime, while tot > max, all new allocations are
disallowed.
> +         *
> +         * Except that this is not a great idea for domains doing sharing
or
> +         * paging, as they need to perform new allocations on the fly.
>           */
> -        d->max_pages = new_max;
> +        if ( (new_max > d->max_pages) ||
> +             !((d->mem_event->paging.ring_page != NULL) ||
> +                d->arch.hvm_domain.mem_sharing_enabled) )
> +            d->max_pages = new_max;
>          ret = 0;
>          spin_unlock(&d->page_alloc_lock);
>

Andres Lagar-Cavilla

2012-Feb-16 14:40 UTC

head link

Re: [PATCH 0 of 4] Handling of (some) low memory conditions

>>>> On 16.02.12 at 04:57, Andres Lagar-Cavilla
<andres@lagarcavilla.org>
>>>> wrote:
>> - Add a VIRQ that the hypervisor can emit when reaching a low memory
>> threshold.
>
> In this patch, which didn''t make it to my inbox yet, you will want
to
> change this
>
> +    if ( (total_avail_pages << PAGE_SHIFT) <= opt_low_mem_virq )
>
> to
>
>     if ( total_avail_pages <= PFN_DOWN(opt_low_mem_virq) )
>
> to avoid the case (on 32-bit hypervisors) where total_avail_pages,
> being just ''long'', would get significant bits shifted
out.
>
> I''m further wondering whether the default value shouldn''t
be set
> dynamically based on available memory and/or taking into account
> an eventual dom0_mem= option.
I can cap or get rid of the threshold (and the virq) if it doesn''t make
sense with respect to total memory. I''m not sure about integrating
dom0_mem, since dom0''s footprint is also a quantity manipulated by the
receiver of the virq.
>
> Finally, rate limiting will be needed for sending the vIRQ - currently,
> once below the threshold you''re sending one on each and every
> allocation (which may become harmful if the listener can''t really
do
> anything about the situation).
All three are sensible. I''ll look into printk rate-limiting for
inspiration.
Thanks,
Andres>
> Jan
>
>

Andres Lagar-Cavilla

2012-Feb-16 14:45 UTC

head link

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

> At 22:57 -0500 on 15 Feb (1329346624), Andres Lagar-Cavilla wrote:
>>  xen/common/domctl.c |  8 +++++++-
>>  1 files changed, 7 insertions(+), 1 deletions(-)
>>
>>
>> Apparently, setting d->max_pages to something lower than
d->tot_pages is
>> used as a mechanism for controling a domain''s footprint. It
will result
>> in all new page allocations failing.
>
> Yep.
>
>> This is a really bad idea with paging or sharing, as regular guest
>> memory
>> accesses may need to be satisfied by allocating new memory (either to
>> page in or to unshare).
>
> Nack.  If a domain ends up with a max_pages so low that it can''t
page
> in, that''s a tools bug.  This patch doesn''t fix it,
because the
> toolstack could set new max == current tot (er, +1) and then you have
> the same problem if you page in twice.   (And also it silently ignores
> the update rather than reporting an error.)
Fair enough (also referring to Jan''s comments). We would be building
policy into the hypervisor.

But I''ve seen squeezed set criminally low max_pages value (i.e. 256).
Granted, this is squeezed''s problem, but shouldn''t some sanity
checking be
wired into the hypervisor? Why should we even allow max_pages < tot_pages?

Also please note that paging can somewhat deal with this because the pager
gets synchronous kicks in a "clean" context -- the pager can swap out
some
stuff with ease. Sharing is seriously disadvantaged here -- the enomem
ring is not mandatory, and failed allocations can happen in runtime paths
rather than on the pager''s request.

Andres
>
> Tim.
>
>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>>
>> diff -r 62b1fe67b8d1 -r 11fd4e0a1e1a xen/common/domctl.c
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -813,8 +813,14 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc
>>           * NB. We removed a check that new_max >= current
tot_pages;
>> this means
>>           * that the domain will now be allowed to "ratchet"
down to
>> new_max. In
>>           * the meantime, while tot > max, all new allocations are
>> disallowed.
>> +         *
>> +         * Except that this is not a great idea for domains doing
>> sharing or
>> +         * paging, as they need to perform new allocations on the fly.
>>           */
>> -        d->max_pages = new_max;
>> +        if ( (new_max > d->max_pages) ||
>> +             !((d->mem_event->paging.ring_page != NULL) ||
>> +                d->arch.hvm_domain.mem_sharing_enabled) )
>> +            d->max_pages = new_max;
>>          ret = 0;
>>          spin_unlock(&d->page_alloc_lock);
>>
>

Tim Deegan

2012-Feb-16 14:58 UTC

head link

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

At 06:45 -0800 on 16 Feb (1329374751), Andres Lagar-Cavilla
wrote:> Fair enough (also referring to Jan''s comments). We would be
building
> policy into the hypervisor.
> 
> But I''ve seen squeezed set criminally low max_pages value (i.e.
256).
> Granted, this is squeezed''s problem, but shouldn''t some
sanity checking be
> wired into the hypervisor?
Some operating systems do just fine in 640K. :)  But seriously, what
lower limit would we use?  Stupidly low max_pages for some uses would be
just fine for others. 
> Why should we even allow max_pages < tot_pages?
The reasoning is:
 - the tools want a hard guarantee that a rogue balloon driver can''t 
   mess up their calculations of how much free RAM there is.
 - when a VM is ballooning down we don''t want to have the tools
spinning
   watching actual max_pages and adjusting tot_pages down as it
   changes.

It''s not a particularly nice interface, though, and I''d be
happy to see
it revert to the old one (where new max had to be <= current tot).  But
that will need a fix for Xapi/squeezed to handle the ballooning-down
case some other way.

Cc-ing xen-API.  Any XCP folks got an opinion about this?  Would it be
easy to make squeezed not need this behaviour?

Cheers,

Tim.

Jan Beulich

2012-Feb-16 15:22 UTC

head link

Re: [PATCH 0 of 4] Handling of (some) low memory conditions

>>> On 16.02.12 at 15:40, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org> wrote:
>>>>> On 16.02.12 at 04:57, Andres Lagar-Cavilla
<andres@lagarcavilla.org>
>>>>> wrote:
>>> - Add a VIRQ that the hypervisor can emit when reaching a low
memory
>>> threshold.
>>
>> In this patch, which didn''t make it to my inbox yet, you will
want to
>> change this
>>
>> +    if ( (total_avail_pages << PAGE_SHIFT) <=
opt_low_mem_virq )
>>
>> to
>>
>>     if ( total_avail_pages <= PFN_DOWN(opt_low_mem_virq) )
>>
>> to avoid the case (on 32-bit hypervisors) where total_avail_pages,
>> being just ''long'', would get significant bits shifted
out.
>>
>> I''m further wondering whether the default value
shouldn''t be set
>> dynamically based on available memory and/or taking into account
>> an eventual dom0_mem= option.
> 
> I can cap or get rid of the threshold (and the virq) if it doesn''t
make
> sense with respect to total memory. I''m not sure about integrating
> dom0_mem, since dom0''s footprint is also a quantity manipulated by
the
> receiver of the virq.
No, dom0_mem= is only specifying the starting value, and the case
that would be of possibly interest is that of having a negative amount
specified.

Jan

Jan Beulich

2012-Feb-16 15:32 UTC

head link

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

>>> On 16.02.12 at 15:45, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org> wrote:
> But I''ve seen squeezed set criminally low max_pages value (i.e.
256).
> Granted, this is squeezed''s problem, but shouldn''t some
sanity checking be
> wired into the hypervisor? Why should we even allow max_pages <
tot_pages?
The only lower boundary that the hypervisor could (perhaps should)
enforce is that of not being able to guarantee guest forward progress:
On x86, up to 2 text pages plus up to 4 data pages, times the number
of page table levels (i.e. 24 pages on 64-bit).

Jan

Andres Lagar-Cavilla

2012-Feb-16 15:34 UTC

head link

Re: [PATCH 0 of 4] Handling of (some) low memory conditions

>>>> On 16.02.12 at 15:40, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org>
>>>> wrote:
>>>>>> On 16.02.12 at 04:57, Andres Lagar-Cavilla
<andres@lagarcavilla.org>
>>>>>> wrote:
>>>> - Add a VIRQ that the hypervisor can emit when reaching a low
memory
>>>> threshold.
>>>
>>> In this patch, which didn''t make it to my inbox yet, you
will want to
>>> change this
>>>
>>> +    if ( (total_avail_pages << PAGE_SHIFT) <=
opt_low_mem_virq )
>>>
>>> to
>>>
>>>     if ( total_avail_pages <= PFN_DOWN(opt_low_mem_virq) )
>>>
>>> to avoid the case (on 32-bit hypervisors) where total_avail_pages,
>>> being just ''long'', would get significant bits
shifted out.
>>>
>>> I''m further wondering whether the default value
shouldn''t be set
>>> dynamically based on available memory and/or taking into account
>>> an eventual dom0_mem= option.
>>
>> I can cap or get rid of the threshold (and the virq) if it
doesn''t make
>> sense with respect to total memory. I''m not sure about
integrating
>> dom0_mem, since dom0''s footprint is also a quantity
manipulated by the
>> receiver of the virq.
>
> No, dom0_mem= is only specifying the starting value, and the case
> that would be of possibly interest is that of having a negative amount
> specified.
Sorry, not following entirely. The negative quantity you refer to would be
dom0_mem or the low mem virq threshold? Additional checks pertaining only
to dom0_mem, not in relation to this threshold, would most likely go on a
separate patch.

And both quantities are parsed using strtoull, so you get at most really
large numbers. I certainly need to add additional checks for unsuitable
low_mem_virq thresholds to this patch.

Thanks,
Andres>
> Jan
>
>

Tim Deegan

2012-Feb-16 16:08 UTC

head link

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

At 15:32 +0000 on 16 Feb (1329406348), Jan Beulich
wrote:> >>> On 16.02.12 at 15:45, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org> wrote:
> > But I''ve seen squeezed set criminally low max_pages value
(i.e. 256).
> > Granted, this is squeezed''s problem, but shouldn''t
some sanity checking be
> > wired into the hypervisor? Why should we even allow max_pages <
tot_pages?
> 
> The only lower boundary that the hypervisor could (perhaps should)
> enforce is that of not being able to guarantee guest forward progress:
> On x86, up to 2 text pages plus up to 4 data pages, times the number
> of page table levels (i.e. 24 pages on 64-bit).
OT: ISTR the actual number of memory accesses needed to complete one x86
instruction is much crazier than that (e.g. a memory-to-memory copy that
faults on the second page of the destination, pushing most of a stack
frame before hitting the SS limit and double-faulting via a task gate,
&c).  We did try to count it exactly for the shadow pagetables once but
gave up and wildly overestimated instead, (because perf near the limit
would have been too awful anyway).

Tim.

Tim Deegan

2012-Feb-16 16:11 UTC

head link

Re: [PATCH 2 of 4] x86/mm: Allow to not sleep on mem event ring

At 22:57 -0500 on 15 Feb (1329346625), Andres Lagar-Cavilla
wrote:>  xen/arch/x86/mm/mem_event.c     |   5 +++--
>  xen/include/asm-x86/mem_event.h |  30 +++++++++++++++++++++++++-----
>  2 files changed, 28 insertions(+), 7 deletions(-)
> 
> 
> Under extreme congestion conditions, generating a mem event may put the
vcpu to
> sleep on a wait queue if the ring is full. This is generally desirable,
although
> fairly convoluted to work with, since sleeping on a wait queue requires a
> non-atomic context (i.e. no locks held).
> 
> Introduce an allow_sleep flag to make this optional. The API remains such
that
> all current callers set allow_sleep to true and thus will sleep if
necessary.
> 
> The end-use is for cases in which loss of guest mem events is tolerable.
One
> such consumer to be added later is the unsharing code under ENOMEM
conditions.
> 
> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> Signed-off-by: Adin Scannell <adin@scannell.ca>
Hmm.  This interface is getting a bit twisty now, but assuming the
patches that use it are OK, then Ack.

Tim.

Tim Deegan

2012-Feb-16 16:19 UTC

head link

Re: [PATCH 3 of 4] Memory sharing: better handling of ENOMEM while unsharing

At 22:57 -0500 on 15 Feb (1329346626), Andres Lagar-Cavilla
wrote:>  xen/arch/x86/hvm/hvm.c            |  20 +++++++++++++-
>  xen/arch/x86/mm.c                 |   8 +++--
>  xen/arch/x86/mm/mem_sharing.c     |  52
++++++++++++++++++++++++---------------
>  xen/arch/x86/mm/p2m.c             |  18 ++++++++++++-
>  xen/common/grant_table.c          |  11 ++++---
>  xen/common/memory.c               |   1 +
>  xen/include/asm-x86/mem_sharing.h |  15 +++++++++++
>  7 files changed, 94 insertions(+), 31 deletions(-)
> 
> 
> If unsharing fails with ENOMEM, we were:
>  - leaving the list of gfns backed by the shared page in an inconsistent
state
>  - cycling forever on the hap page fault handler.
>  - Attempting to produce a mem event (which could sleep on a wait queue)
>    while holding locks.
>  - Not checking, for all callers, that unshare could have indeed failed.
> 
> Fix bugs above, and sanitize callers to place a ring event in an unlocked
> context, or without requiring to go to sleep on a wait queue.
> 
> A note on the rationale for unshare error handling:
>  1. Unshare can only fail with ENOMEM. Any other error conditions
BUG_ON()''s
Please enforce this in the code -- either the function should just
return success/failure or (better, I think) the caller should ASSERT
that it doesn''t see any other error codes.
>  2. We notify a potential dom0 helper through a mem_event ring. But we
>     allow the notification to not go to sleep. If the event ring is full
>     of ENOMEM warnings, then the helper will already have been kicked
enough.
Could we just keep a count (or flag) to remind us that an ENOMEM is in
the pipe and avoid having an exception to the waiting rules?
>  3. We cannot "just" go to sleep until the unshare is resolved,
because we
>     might be buried deep into locks (e.g. something -> copy_to_user
->
>     __hvm_copy)
>  4. So, we make sure we:
>     4.1. return an error
>     4.2. do not corrupt shared memory
>     4.3. do not corrupt guest memory
>     4.4. let the guest deal with the error if propagation will reach that
far
Yep, the other cleanups look good to me. 

Cheers,

Tim.

Jan Beulich

2012-Feb-16 16:26 UTC

head link

Re: [PATCH 0 of 4] Handling of (some) low memory conditions

>>> On 16.02.12 at 16:34, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org> wrote:
>>>>> On 16.02.12 at 15:40, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org>
>>>>> wrote:
>>>>>>> On 16.02.12 at 04:57, Andres Lagar-Cavilla
<andres@lagarcavilla.org>
>>>>>>> wrote:
>>>>> - Add a VIRQ that the hypervisor can emit when reaching a
low memory
>>>>> threshold.
>>>>
>>>> In this patch, which didn''t make it to my inbox yet,
you will want to
>>>> change this
>>>>
>>>> +    if ( (total_avail_pages << PAGE_SHIFT) <=
opt_low_mem_virq )
>>>>
>>>> to
>>>>
>>>>     if ( total_avail_pages <= PFN_DOWN(opt_low_mem_virq) )
>>>>
>>>> to avoid the case (on 32-bit hypervisors) where
total_avail_pages,
>>>> being just ''long'', would get significant bits
shifted out.
>>>>
>>>> I''m further wondering whether the default value
shouldn''t be set
>>>> dynamically based on available memory and/or taking into
account
>>>> an eventual dom0_mem= option.
>>>
>>> I can cap or get rid of the threshold (and the virq) if it
doesn''t make
>>> sense with respect to total memory. I''m not sure about
integrating
>>> dom0_mem, since dom0''s footprint is also a quantity
manipulated by the
>>> receiver of the virq.
>>
>> No, dom0_mem= is only specifying the starting value, and the case
>> that would be of possibly interest is that of having a negative amount
>> specified.
> 
> Sorry, not following entirely. The negative quantity you refer to would be
> dom0_mem or the low mem virq threshold? Additional checks pertaining only
> to dom0_mem, not in relation to this threshold, would most likely go on a
> separate patch.
Consider someone useing "dom0_mem=-64M" - that''d make your
code raise the vIRQ immediately, even though the admin decided
that leaving 64M inside Xen is okay at boot time at least.

Jan

Jan Beulich

2012-Feb-16 16:44 UTC

head link

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

>>> On 16.02.12 at 17:08, Tim Deegan <tim@xen.org> wrote:
> At 15:32 +0000 on 16 Feb (1329406348), Jan Beulich wrote:
>> >>> On 16.02.12 at 15:45, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org> wrote:
>> > But I''ve seen squeezed set criminally low max_pages value
(i.e. 256).
>> > Granted, this is squeezed''s problem, but
shouldn''t some sanity checking be
>> > wired into the hypervisor? Why should we even allow max_pages <
tot_pages?
>> 
>> The only lower boundary that the hypervisor could (perhaps should)
>> enforce is that of not being able to guarantee guest forward progress:
>> On x86, up to 2 text pages plus up to 4 data pages, times the number
>> of page table levels (i.e. 24 pages on 64-bit).
> 
> OT: ISTR the actual number of memory accesses needed to complete one x86
> instruction is much crazier than that (e.g. a memory-to-memory copy that
> faults on the second page of the destination, pushing most of a stack
> frame before hitting the SS limit and double-faulting via a task gate,
> &c).  We did try to count it exactly for the shadow pagetables once but
> gave up and wildly overestimated instead, (because perf near the limit
> would have been too awful anyway).
Hmm, indeed: When a page fault occurs, you don''t need the text or
data pages the original instruction referenced anymore. Instead, you
now need up to 2 IDT pages, up to 4 GDT pages, and up to 2 stack
pages. So yes, the number is higher (but not that much). An exception
going through a task gate is of course much uglier, but luckily that
doesn''t need to be considered at least for x86-64.

Jan

Andres Lagar-Cavilla

2012-Feb-17 16:57 UTC

head link

Re: [PATCH 2 of 4] x86/mm: Allow to not sleep on mem event ring

> At 22:57 -0500 on 15 Feb (1329346625), Andres Lagar-Cavilla wrote:
>>  xen/arch/x86/mm/mem_event.c     |   5 +++--
>>  xen/include/asm-x86/mem_event.h |  30 +++++++++++++++++++++++++-----
>>  2 files changed, 28 insertions(+), 7 deletions(-)
>>
>>
>> Under extreme congestion conditions, generating a mem event may put the
>> vcpu to
>> sleep on a wait queue if the ring is full. This is generally desirable,
>> although
>> fairly convoluted to work with, since sleeping on a wait queue requires
>> a
>> non-atomic context (i.e. no locks held).
>>
>> Introduce an allow_sleep flag to make this optional. The API remains
>> such that
>> all current callers set allow_sleep to true and thus will sleep if
>> necessary.
>>
>> The end-use is for cases in which loss of guest mem events is
tolerable.
>> One
>> such consumer to be added later is the unsharing code under ENOMEM
>> conditions.
>>
>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>> Signed-off-by: Adin Scannell <adin@scannell.ca>
>
> Hmm.  This interface is getting a bit twisty now, but assuming the
> patches that use it are OK, then Ack.
Another option is to call a ring "best effort", and then the event
producer doesn''t have to specify whether sleep is allowable. The event
producer should know, though, whether the ring is best effort. I think
that makes up for a cleaner interface.

Andres>
> Tim.
>
>

Andres Lagar-Cavilla

2012-Feb-17 17:01 UTC

head link

Re: [PATCH 3 of 4] Memory sharing: better handling of ENOMEM while unsharing

> At 22:57 -0500 on 15 Feb (1329346626), Andres Lagar-Cavilla wrote:
>>  xen/arch/x86/hvm/hvm.c            |  20 +++++++++++++-
>>  xen/arch/x86/mm.c                 |   8 +++--
>>  xen/arch/x86/mm/mem_sharing.c     |  52
>> ++++++++++++++++++++++++---------------
>>  xen/arch/x86/mm/p2m.c             |  18 ++++++++++++-
>>  xen/common/grant_table.c          |  11 ++++---
>>  xen/common/memory.c               |   1 +
>>  xen/include/asm-x86/mem_sharing.h |  15 +++++++++++
>>  7 files changed, 94 insertions(+), 31 deletions(-)
>>
>>
>> If unsharing fails with ENOMEM, we were:
>>  - leaving the list of gfns backed by the shared page in an
inconsistent
>> state
>>  - cycling forever on the hap page fault handler.
>>  - Attempting to produce a mem event (which could sleep on a wait
queue)
>>    while holding locks.
>>  - Not checking, for all callers, that unshare could have indeed
failed.
>>
>> Fix bugs above, and sanitize callers to place a ring event in an
>> unlocked
>> context, or without requiring to go to sleep on a wait queue.
>>
>> A note on the rationale for unshare error handling:
>>  1. Unshare can only fail with ENOMEM. Any other error conditions
>> BUG_ON()''s
>
> Please enforce this in the code -- either the function should just
> return success/failure or (better, I think) the caller should ASSERT
> that it doesn''t see any other error codes.
I''ll turn unshare into __unshare and make unshare a wrapper that
asserts
the return value semantics of __unshare.
>
>>  2. We notify a potential dom0 helper through a mem_event ring. But we
>>     allow the notification to not go to sleep. If the event ring is
full
>>     of ENOMEM warnings, then the helper will already have been kicked
>> enough.
>
> Could we just keep a count (or flag) to remind us that an ENOMEM is in
> the pipe and avoid having an exception to the waiting rules?
If we follow the "best effort ring" idea I just proposed for the
previous
patch, it would make this one cleaner as well.

I don''t think there''s a point in remembering enomem''s
are on the pipe. The
helper will be woken up on enomem 1, it won''t need over 64
notifications
to do something about it.

Andres
>
>>  3. We cannot "just" go to sleep until the unshare is
resolved, because
>> we
>>     might be buried deep into locks (e.g. something -> copy_to_user
->
>>     __hvm_copy)
>>  4. So, we make sure we:
>>     4.1. return an error
>>     4.2. do not corrupt shared memory
>>     4.3. do not corrupt guest memory
>>     4.4. let the guest deal with the error if propagation will reach
>> that far
>
> Yep, the other cleanups look good to me.
>
> Cheers,
>
> Tim.
>
>

Xen devel - Feb 2012 - [PATCH 0 of 4] Handling of (some) low memory conditions

[PATCH 0 of 4] Handling of (some) low memory conditions

[PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

[PATCH 2 of 4] x86/mm: Allow to not sleep on mem event ring

[PATCH 3 of 4] Memory sharing: better handling of ENOMEM while unsharing

[PATCH 4 of 4] Global virq for low memory situations

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

Re: [PATCH 0 of 4] Handling of (some) low memory conditions

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

Re: [PATCH 0 of 4] Handling of (some) low memory conditions

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

Re: [PATCH 0 of 4] Handling of (some) low memory conditions

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

Re: [PATCH 0 of 4] Handling of (some) low memory conditions

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

Re: [PATCH 2 of 4] x86/mm: Allow to not sleep on mem event ring

Re: [PATCH 3 of 4] Memory sharing: better handling of ENOMEM while unsharing

Re: [PATCH 0 of 4] Handling of (some) low memory conditions

Re: [PATCH 1 of 4] Prevent low values of max_pages for domains doing sharing or paging

Re: [PATCH 2 of 4] x86/mm: Allow to not sleep on mem event ring

Re: [PATCH 3 of 4] Memory sharing: better handling of ENOMEM while unsharing