thr3ads.net - Xen devel - [Xen-devel] [PATCH 0 of 4] Mem event handling improvements [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Andres Lagar-Cavilla

2011-Nov-14 21:58 UTC

[Xen-devel] [PATCH 0 of 4] Mem event handling improvements

Several improvements to the mem event interface are included in this 
series. Most notably:

- The ability to trigger resume actions in the hypervisor via an
  event channel kick, as opposed to a domctl. Less locking and more
  batching.

- Improvements to the management of the mem event ring. Improvements 
  on return codes, vcpu pause semantics, and handling of corner 
  cases.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Signed-off-by: Adin Scannell <adin@scannell.ca>
Signed-off-by: Keir Fraser <keir@xen.org>

 xen/arch/x86/hvm/hvm.c          |   21 +++-
 xen/arch/x86/mm/mem_event.c     |  173 +++++++++++++++++++++++++++++----------
 xen/arch/x86/mm/mem_sharing.c   |   27 ++++-
 xen/arch/x86/mm/p2m.c           |  100 ++++++++++++----------
 xen/include/asm-x86/mem_event.h |   16 ++-
 xen/include/xen/sched.h         |    5 +-
 xen/arch/ia64/vmx/vmx_init.c    |    2 +-
 xen/arch/x86/hvm/hvm.c          |    5 +-
 xen/arch/x86/mm/mem_event.c     |    3 +-
 xen/common/event_channel.c      |   75 +++++++++++++---
 xen/include/xen/event.h         |    5 +-
 xen/include/xen/sched.h         |    2 +-
 xen/arch/x86/mm/mem_access.c    |    3 +-
 xen/arch/x86/mm/p2m.c           |    3 +-
 xen/include/asm-x86/p2m.h       |    2 +-
 xen/arch/x86/mm/mem_event.c     |   26 ++++-
 16 files changed, 324 insertions(+), 144 deletions(-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar-Cavilla

2011-Nov-14 21:58 UTC

head link

[Xen-devel] [PATCH 1 of 4] Improve ring management for memory events

xen/arch/x86/hvm/hvm.c          |   21 +++-
 xen/arch/x86/mm/mem_event.c     |  173 +++++++++++++++++++++++++++++----------
 xen/arch/x86/mm/mem_sharing.c   |   27 ++++-
 xen/arch/x86/mm/p2m.c           |  100 ++++++++++++----------
 xen/include/asm-x86/mem_event.h |   16 ++-
 xen/include/xen/sched.h         |    5 +-
 6 files changed, 232 insertions(+), 110 deletions(-)


The memevent code currently has a mechanism for reserving space in the ring
before putting an event, but each caller must individually ensure that the
vCPUs are correctly paused if no space is available.

This fixes that issue by reversing the semantics, we ensure that enough space
is always left for one event per vCPU in the ring.  If this constraint will
be violated by the current vCPU putting another request in the ring, we pause
it after putting the current event.

If no foreign mappings cause mem events, then no mem events will be lost.
Because events caused by foreign mapping are effectively unbound, we cannot
guarantee with certainty lack of event loss.

Additionally, we ensure that no events are lost by Xen as a consumer if multiple
responses land on the ring for a single domctl. While the current domctl-based
notifications to Xen disallow batching, this is required for later patches.

Finally, mark_and_pause_vcpus was misleading, beacause it wasn''t
strictly
pausing the vcpu''s, rather sending them to sleep. Change to actual
vcpu_pause, which protects wakeups via an atomic count. This is useful when
an event pauses a vcpu and the vcpu gets mark and paused due to ring congestion.

Signed-off-by: Adin Scannell <adin@scannell.ca>
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 19a5a2cddad3 -r ee909e5a9d85 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4077,7 +4077,6 @@ static int hvm_memory_event_traps(long p
     struct vcpu* v = current;
     struct domain *d = v->domain;
     mem_event_request_t req;
-    int rc;
 
     if ( !(p & HVMPME_MODE_MASK) ) 
         return 0;
@@ -4085,10 +4084,6 @@ static int hvm_memory_event_traps(long p
     if ( (p & HVMPME_onchangeonly) && (value == old) )
         return 1;
     
-    rc = mem_event_check_ring(d, &d->mem_access);
-    if ( rc )
-        return rc;
-    
     memset(&req, 0, sizeof(req));
     req.type = MEM_EVENT_TYPE_ACCESS;
     req.reason = reason;
@@ -4108,7 +4103,21 @@ static int hvm_memory_event_traps(long p
         req.gla_valid = 1;
     }
     
-    mem_event_put_request(d, &d->mem_access, &req);
+    if ( mem_event_put_request(d, &d->mem_access, &req) )
+    {
+        /* rc == -ENOSYS
+         * If there was no ring to send the event, then simply unpause the
+         * vcpu (if we had previously paused it). It will retry the 
+         * instruction (with the exception "handled") and go on
+         * rc == -EBUSY
+         * If the ring is full, the vcpu has been marked and paused,
+         * so we also need to unpause it (if we had previously paused
+         * it). Once the consumer depletes the ring, the vcpu will be
+         * woken up and will retry. The consumer may have lost this
+         * exception, though. */
+        if ( req.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) 
+            vcpu_unpause(v);
+    }
     
     return 1;
 }
diff -r 19a5a2cddad3 -r ee909e5a9d85 xen/arch/x86/mm/mem_event.c
--- a/xen/arch/x86/mm/mem_event.c
+++ b/xen/arch/x86/mm/mem_event.c
@@ -91,6 +91,9 @@ static int mem_event_enable(struct domai
     put_gfn(dom_mem_event, ring_gfn);
     put_gfn(dom_mem_event, shared_gfn); 
 
+    /* Set the number of currently blocked vCPUs to 0. */
+    med->blocked = 0;
+
     /* Allocate event channel */
     rc = alloc_unbound_xen_event_channel(d->vcpu[0],
                                          current->domain->domain_id);
@@ -108,7 +111,7 @@ static int mem_event_enable(struct domai
     mem_event_ring_lock_init(med);
 
     /* Wake any VCPUs paused for memory events */
-    mem_event_unpause_vcpus(d);
+    mem_event_unpause_vcpus(d, med);
 
     return 0;
 
@@ -133,31 +136,83 @@ static int mem_event_disable(struct mem_
     return 0;
 }
 
-void mem_event_put_request(struct domain *d, struct mem_event_domain *med,
mem_event_request_t *req)
+static inline int mem_event_ring_free(struct domain *d, struct mem_event_domain
*med)
 {
+    int free_requests;
+
+    free_requests = RING_FREE_REQUESTS(&med->front_ring);
+    if ( unlikely(free_requests < d->max_vcpus) )
+    {
+        /* This may happen during normal operation (hopefully not often). */
+        gdprintk(XENLOG_INFO, "mem_event request slots for domain %d:
%d\n",
+                               d->domain_id, free_requests);
+    }
+
+    return free_requests;
+}
+
+/* Return values
+ * zero: success
+ * -ENOSYS: no ring
+ * -EAGAIN: ring is full and the event has not been transmitted.
+ *          Only foreign vcpu''s get EAGAIN
+ * -EBUSY: guest vcpu has been paused due to ring congestion
+ */ 
+int mem_event_put_request(struct domain *d, struct mem_event_domain *med,
mem_event_request_t *req)
+{
+    int ret = 0;
     mem_event_front_ring_t *front_ring;
     RING_IDX req_prod;
 
+    if ( mem_event_check_ring(d, med) )
+        return -ENOSYS;
+
     mem_event_ring_lock(med);
 
-    front_ring = &med->front_ring;
-    req_prod = front_ring->req_prod_pvt;
+    if ( mem_event_ring_free(d, med) == 0 )
+    {
+        /* This *may* happen, but only when foreign mappings generate events.
*/
+        gdprintk(XENLOG_WARNING, "possible lost mem_event for domain
%d\n",
+                                 d->domain_id);
+        ret = -EAGAIN;
+    }
+    else
+    {
+        front_ring = &med->front_ring;
+        req_prod = front_ring->req_prod_pvt;
 
-    /* Copy request */
-    memcpy(RING_GET_REQUEST(front_ring, req_prod), req, sizeof(*req));
-    req_prod++;
+        /* Copy request */
+        memcpy(RING_GET_REQUEST(front_ring, req_prod), req, sizeof(*req));
+        req_prod++;
 
-    /* Update ring */
-    med->req_producers--;
-    front_ring->req_prod_pvt = req_prod;
-    RING_PUSH_REQUESTS(front_ring);
+        /* Update ring */
+        front_ring->req_prod_pvt = req_prod;
+        RING_PUSH_REQUESTS(front_ring);
+    }
+
+    /*
+     * We ensure that each vcpu can put at least *one* event -- because some
+     * events are not repeatable, such as dropping a page.  This will ensure no
+     * vCPU is left with an event that they must place on the ring, but cannot.
+     * They will be paused after the event is placed.
+     * See large comment below in mem_event_unpause_vcpus().
+     */
+    if ( current->domain == d &&
+        mem_event_ring_free(d, med) < d->max_vcpus )
+    {
+        mem_event_mark_and_pause(current, med);
+        /* For guest vcpu, overwrite EAGAIN */
+        ret = -EBUSY;
+    }
 
     mem_event_ring_unlock(med);
 
     notify_via_xen_event_channel(d, med->xen_port);
+
+    return ret;
 }
 
-void mem_event_get_response(struct mem_event_domain *med, mem_event_response_t
*rsp)
+int mem_event_get_response(struct domain *d, struct mem_event_domain *med,
mem_event_response_t *rsp)
 {
     mem_event_front_ring_t *front_ring;
     RING_IDX rsp_cons;
@@ -167,6 +222,12 @@ void mem_event_get_response(struct mem_e
     front_ring = &med->front_ring;
     rsp_cons = front_ring->rsp_cons;
 
+    if ( !RING_HAS_UNCONSUMED_RESPONSES(front_ring) )
+    {
+        mem_event_ring_unlock(med);
+        return 0;
+    }
+
     /* Copy response */
     memcpy(rsp, RING_GET_RESPONSE(front_ring, rsp_cons), sizeof(*rsp));
     rsp_cons++;
@@ -176,54 +237,74 @@ void mem_event_get_response(struct mem_e
     front_ring->sring->rsp_event = rsp_cons + 1;
 
     mem_event_ring_unlock(med);
+
+    return 1;
 }
 
-void mem_event_unpause_vcpus(struct domain *d)
+void mem_event_unpause_vcpus(struct domain *d, struct mem_event_domain *med)
 {
     struct vcpu *v;
+    int free, i, j, k;
+    int online = d->max_vcpus;
 
+    if ( !med->blocked )
+        return;
+
+    mem_event_ring_lock(med);
+    free = mem_event_ring_free(d, med);
+
+    /*
+     * We ensure that we only have vCPUs online if there are enough free slots
+     * for their memory events to be processed.  This will ensure that no
+     * memory events are lost (due to the fact that certain types of events
+     * cannot be replayed, we need to ensure that there is space in the ring
+     * for when they are hit). 
+     * See large comment above in mem_event_put_request().
+     */
     for_each_vcpu ( d, v )
-        if ( test_and_clear_bit(_VPF_mem_event, &v->pause_flags) )
-            vcpu_wake(v);
+        if ( test_bit(_VPF_mem_event, &v->pause_flags) )
+            online--;
+
+    /* We remember which vcpu last woke up to avoid scanning always linearly
+     * from xero and starving higher-numbered vcpus under high load */
+    if ( d->vcpu )
+    {
+        for (i = med->last_vcpu_wake_up + 1, j = 0; j < d->max_vcpus;
i++, j++)
+        {
+            k = i % d->max_vcpus;
+            v = d->vcpu[k];
+            if ( !v ) continue;
+
+            if ( !(med->blocked) || online >= free )
+                break;
+            if ( test_and_clear_bit(_VPF_mem_event, &v->pause_flags) )
+            {
+                vcpu_unpause(v);
+                online++;
+                med->blocked--;
+                med->last_vcpu_wake_up = k;
+            }
+        }
+    }
+
+    mem_event_ring_unlock(med);
 }
 
-void mem_event_mark_and_pause(struct vcpu *v)
+void mem_event_mark_and_pause(struct vcpu *v, struct mem_event_domain *med)
 {
-    set_bit(_VPF_mem_event, &v->pause_flags);
-    vcpu_sleep_nosync(v);
-}
-
-void mem_event_put_req_producers(struct mem_event_domain *med)
-{
-    mem_event_ring_lock(med);
-    med->req_producers--;
-    mem_event_ring_unlock(med);
+    if ( !test_and_set_bit(_VPF_mem_event, &v->pause_flags) )
+    {
+        vcpu_pause_nosync(v);
+        med->blocked++;
+    }
 }
 
 int mem_event_check_ring(struct domain *d, struct mem_event_domain *med)
 {
-    struct vcpu *curr = current;
-    int free_requests;
-    int ring_full = 1;
+    if ( !med->ring_page )
+        return -ENOSYS;
 
-    if ( !med->ring_page )
-        return -1;
-
-    mem_event_ring_lock(med);
-
-    free_requests = RING_FREE_REQUESTS(&med->front_ring);
-    if ( med->req_producers < free_requests )
-    {
-        med->req_producers++;
-        ring_full = 0;
-    }
-
-    if ( ring_full && (curr->domain == d) )
-        mem_event_mark_and_pause(curr);
-
-    mem_event_ring_unlock(med);
-
-    return ring_full;
+    return 0;
 }
 
 int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec,
diff -r 19a5a2cddad3 -r ee909e5a9d85 xen/arch/x86/mm/mem_sharing.c
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -281,12 +281,19 @@ static struct page_info* mem_sharing_all
     vcpu_pause_nosync(v);
     req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED;
 
-    if(mem_event_check_ring(d, &d->mem_share)) return page;
-
     req.gfn = gfn;
     req.p2mt = p2m_ram_shared;
     req.vcpu_id = v->vcpu_id;
-    mem_event_put_request(d, &d->mem_share, &req);
+
+    /* If there is no ring, and we''re out of memory, then
+     * there is no way out. */
+    if ( mem_event_put_request(d, &d->mem_share, &req) == -ENOSYS )
+    {
+        gdprintk(XENLOG_ERR, 
+                 "Failed alloc on unshare path for %hu and no ring "
+                 "to upcall\n", d->domain_id);
+        domain_crash(d);
+    }
 
     return page;
 }
@@ -300,12 +307,16 @@ int mem_sharing_sharing_resume(struct do
 {
     mem_event_response_t rsp;
 
-    /* Get request off the ring */
-    mem_event_get_response(&d->mem_share, &rsp);
+    /* Get all requests off the ring */
+    while ( mem_event_get_response(d, &d->mem_share, &rsp) )
+    {
+        /* Unpause domain/vcpu */
+        if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
+            vcpu_unpause(d->vcpu[rsp.vcpu_id]);
+    }
 
-    /* Unpause domain/vcpu */
-    if( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
-        vcpu_unpause(d->vcpu[rsp.vcpu_id]);
+    /* Unpause any domains that were paused because the ring was full */
+    mem_event_unpause_vcpus(d, &d->mem_paging);
 
     return 0;
 }
diff -r 19a5a2cddad3 -r ee909e5a9d85 xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -877,7 +877,7 @@ void p2m_mem_paging_drop_page(struct dom
     struct vcpu *v = current;
     mem_event_request_t req;
 
-    /* Check that there''s space on the ring for this request */
+    /* Check that there''s a paging listener who cares about this */
     if ( mem_event_check_ring(d, &d->mem_paging) == 0)
     {
         /* Send release notification to pager */
@@ -886,7 +886,8 @@ void p2m_mem_paging_drop_page(struct dom
         req.gfn = gfn;
         req.vcpu_id = v->vcpu_id;
 
-        mem_event_put_request(d, &d->mem_paging, &req);
+        /* If the event is lost, tough bananas */
+        (void)mem_event_put_request(d, &d->mem_paging, &req);
     }
 }
 
@@ -920,9 +921,14 @@ void p2m_mem_paging_populate(struct doma
     mfn_t mfn;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
-    /* Check that there''s space on the ring for this request */
+    /* We''re paging. There should be a ring */
     if ( mem_event_check_ring(d, &d->mem_paging) )
+    {
+        gdprintk(XENLOG_ERR, "Domain %hu paging gfn %lx yet no ring "
+                    "in place\n", d->domain_id, gfn);
+        domain_crash(d);
         return;
+    }
 
     memset(&req, 0, sizeof(req));
     req.type = MEM_EVENT_TYPE_PAGING;
@@ -951,7 +957,6 @@ void p2m_mem_paging_populate(struct doma
     else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged )
     {
         /* gfn is already on its way back and vcpu is not paused */
-        mem_event_put_req_producers(&d->mem_paging);
         return;
     }
 
@@ -960,7 +965,14 @@ void p2m_mem_paging_populate(struct doma
     req.p2mt = p2mt;
     req.vcpu_id = v->vcpu_id;
 
-    mem_event_put_request(d, &d->mem_paging, &req);
+    if ( mem_event_put_request(d, &d->mem_paging, &req) == -EBUSY)
+    {
+        /* The ring is full, so we unpause the vcpu (if we had paused
+         * it), and let it retry, once the consumer has made some space
+         * in the ring. Checks when pausing ensure this is a guest vcpu */
+        if ( req.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
+            vcpu_unpause(v);
+    }
 }
 
 /**
@@ -1038,31 +1050,32 @@ void p2m_mem_paging_resume(struct domain
     p2m_access_t a;
     mfn_t mfn;
 
-    /* Pull the response off the ring */
-    mem_event_get_response(&d->mem_paging, &rsp);
+    /* Pull all responses off the ring */
+    while( mem_event_get_response(d, &d->mem_paging, &rsp) )
+    {
+        /* Fix p2m entry if the page was not dropped */
+        if ( !(rsp.flags & MEM_EVENT_FLAG_DROP_PAGE) )
+        {
+            p2m_lock(p2m);
+            mfn = p2m->get_entry(p2m, rsp.gfn, &p2mt, &a, p2m_query,
NULL);
+            /* Allow only pages which were prepared properly, or pages which
+             * were nominated but not evicted */
+            if ( mfn_valid(mfn) && 
+                 (p2mt == p2m_ram_paging_in || p2mt == p2m_ram_paging_in_start)
)
+            {
+                set_p2m_entry(p2m, rsp.gfn, mfn, PAGE_ORDER_4K, p2m_ram_rw, a);
+                set_gpfn_from_mfn(mfn_x(mfn), rsp.gfn);
+            }
+            p2m_unlock(p2m);
+        }
 
-    /* Fix p2m entry if the page was not dropped */
-    if ( !(rsp.flags & MEM_EVENT_FLAG_DROP_PAGE) )
-    {
-        p2m_lock(p2m);
-        mfn = p2m->get_entry(p2m, rsp.gfn, &p2mt, &a, p2m_query,
NULL);
-        /* Allow only pages which were prepared properly, or pages which
-         * were nominated but not evicted */
-        if ( mfn_valid(mfn) && 
-             (p2mt == p2m_ram_paging_in || p2mt == p2m_ram_paging_in_start) )
-        {
-            set_p2m_entry(p2m, rsp.gfn, mfn, PAGE_ORDER_4K, p2m_ram_rw, a);
-            set_gpfn_from_mfn(mfn_x(mfn), rsp.gfn);
-        }
-        p2m_unlock(p2m);
+        /* Unpause domain */
+        if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
+            vcpu_unpause(d->vcpu[rsp.vcpu_id]);
     }
 
-    /* Unpause domain */
-    if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
-        vcpu_unpause(d->vcpu[rsp.vcpu_id]);
-
     /* Unpause any domains that were paused because the ring was full */
-    mem_event_unpause_vcpus(d);
+    mem_event_unpause_vcpus(d, &d->mem_paging);
 }
 
 void p2m_mem_access_check(unsigned long gpa, bool_t gla_valid, unsigned long
gla,
@@ -1073,7 +1086,6 @@ void p2m_mem_access_check(unsigned long 
     unsigned long gfn = gpa >> PAGE_SHIFT;
     struct domain *d = v->domain;    
     struct p2m_domain* p2m = p2m_get_hostp2m(d);
-    int res;
     mfn_t mfn;
     p2m_type_t p2mt;
     p2m_access_t p2ma;
@@ -1091,17 +1103,14 @@ void p2m_mem_access_check(unsigned long 
     p2m_unlock(p2m);
 
     /* Otherwise, check if there is a memory event listener, and send the
message along */
-    res = mem_event_check_ring(d, &d->mem_access);
-    if ( res < 0 ) 
+    if ( mem_event_check_ring(d, &d->mem_access) == -ENOSYS ) 
     {
         /* No listener */
         if ( p2m->access_required ) 
         {
-            printk(XENLOG_INFO 
-                   "Memory access permissions failure, no mem_event
listener: pausing VCPU %d, dom %d\n",
-                   v->vcpu_id, d->domain_id);
-
-            mem_event_mark_and_pause(v);
+            gdprintk(XENLOG_INFO, "Memory access permissions failure, no
mem_event "
+                     "listener VCPU %d, dom %d\n", v->vcpu_id,
d->domain_id);
+            domain_crash(v->domain);
         }
         else
         {
@@ -1113,8 +1122,6 @@ void p2m_mem_access_check(unsigned long 
 
         return;
     }
-    else if ( res > 0 )
-        return;  /* No space in buffer; VCPU paused */
 
     memset(&req, 0, sizeof(req));
     req.type = MEM_EVENT_TYPE_ACCESS;
@@ -1135,9 +1142,12 @@ void p2m_mem_access_check(unsigned long 
     
     req.vcpu_id = v->vcpu_id;
 
-    mem_event_put_request(d, &d->mem_access, &req);
+    /* If the ring was busy, this guest vcpu got marked and paused (again), so 
+     * we can remove our unconditional vcpu pause. */
+    if ( mem_event_put_request(d, &d->mem_access, &req) == -EBUSY )
+        vcpu_unpause(v);
 
-    /* VCPU paused, mem event request sent */
+    /* VCPU paused */
 }
 
 void p2m_mem_access_resume(struct p2m_domain *p2m)
@@ -1145,15 +1155,17 @@ void p2m_mem_access_resume(struct p2m_do
     struct domain *d = p2m->domain;
     mem_event_response_t rsp;
 
-    mem_event_get_response(&d->mem_access, &rsp);
-
-    /* Unpause domain */
-    if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
-        vcpu_unpause(d->vcpu[rsp.vcpu_id]);
+    /* Pull all responses off the ring */
+    while( mem_event_get_response(d, &d->mem_access, &rsp) )
+    {
+        /* Unpause domain */
+        if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
+            vcpu_unpause(d->vcpu[rsp.vcpu_id]);
+    }
 
     /* Unpause any domains that were paused because the ring was full or no
listener
      * was available */
-    mem_event_unpause_vcpus(d);
+    mem_event_unpause_vcpus(d, &d->mem_access);
 }
 
 
diff -r 19a5a2cddad3 -r ee909e5a9d85 xen/include/asm-x86/mem_event.h
--- a/xen/include/asm-x86/mem_event.h
+++ b/xen/include/asm-x86/mem_event.h
@@ -25,12 +25,18 @@
 #define __MEM_EVENT_H__
 
 /* Pauses VCPU while marking pause flag for mem event */
-void mem_event_mark_and_pause(struct vcpu *v);
+void mem_event_mark_and_pause(struct vcpu *v, struct mem_event_domain *med);
 int mem_event_check_ring(struct domain *d, struct mem_event_domain *med);
-void mem_event_put_req_producers(struct mem_event_domain *med);
-void mem_event_put_request(struct domain *d, struct mem_event_domain *med,
mem_event_request_t *req);
-void mem_event_get_response(struct mem_event_domain *med, mem_event_response_t
*rsp);
-void mem_event_unpause_vcpus(struct domain *d);
+void mem_event_unpause_vcpus(struct domain *d, struct mem_event_domain *med);
+
+/* Returns 0 on success, -ENOSYS if there is no ring, -EBUSY if there is 
+ * a ring or no space. FOr success or -EBUSY. the vCPU is left blocked to 
+ * ensure that the ring does not lose events.  In general, put_request should 
+ * not fail, as it attempts to ensure that each vCPU can put at least one req.
*/
+int mem_event_put_request(struct domain *d, struct mem_event_domain *med,
+                            mem_event_request_t *req);
+int mem_event_get_response(struct domain *d, struct mem_event_domain *med,
+                            mem_event_response_t *rsp);
 
 int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec,
                      XEN_GUEST_HANDLE(void) u_domctl);
diff -r 19a5a2cddad3 -r ee909e5a9d85 xen/include/xen/sched.h
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -183,13 +183,16 @@ struct mem_event_domain
 {
     /* ring lock */
     spinlock_t ring_lock;
-    unsigned int req_producers;
     /* shared page */
     mem_event_shared_page_t *shared_page;
     /* shared ring page */
     void *ring_page;
     /* front-end ring */
     mem_event_front_ring_t front_ring;
+    /* the number of vCPUs blocked */
+    unsigned int blocked;
+    /* The last vcpu woken up */
+    unsigned int last_vcpu_wake_up;
     /* event channel port (vcpu0 only) */
     int xen_port;
 };

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar-Cavilla

2011-Nov-14 21:58 UTC

head link

[Xen-devel] [PATCH 2 of 4] Create a generic callback mechanism for Xen-bound event channels

xen/arch/ia64/vmx/vmx_init.c |   2 +-
 xen/arch/x86/hvm/hvm.c       |   5 +-
 xen/arch/x86/mm/mem_event.c  |   3 +-
 xen/common/event_channel.c   |  75 ++++++++++++++++++++++++++++++++++---------
 xen/include/xen/event.h      |   5 ++-
 xen/include/xen/sched.h      |   2 +-
 6 files changed, 69 insertions(+), 23 deletions(-)


For event channels for which Xen is the consumer, there currently is
a single action. With this patch, we allow event channel creators to
specify a generic callback (or no callback). Because the expectation
is that there will be few callbacks, they are stored in a small table.

Signed-off-by: Adin Scannell <adin@scannell.ca>
Signed-off-by: Keir Fraser <keir@xen.org>
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r ee909e5a9d85 -r b3cdfb5b76d0 xen/arch/ia64/vmx/vmx_init.c
--- a/xen/arch/ia64/vmx/vmx_init.c
+++ b/xen/arch/ia64/vmx/vmx_init.c
@@ -377,7 +377,7 @@ vmx_vcpu_initialise(struct vcpu *v)
 {
 	struct vmx_ioreq_page *iorp = &v->domain->arch.hvm_domain.ioreq;
 
-	int rc = alloc_unbound_xen_event_channel(v, 0);
+	int rc = alloc_unbound_xen_event_channel(v, 0, NULL);
 	if (rc < 0)
 		return rc;
 	v->arch.arch_vmx.xen_port = rc;
diff -r ee909e5a9d85 -r b3cdfb5b76d0 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -979,7 +979,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
         goto fail3;
 
     /* Create ioreq event channel. */
-    rc = alloc_unbound_xen_event_channel(v, 0);
+    rc = alloc_unbound_xen_event_channel(v, 0, NULL);
     if ( rc < 0 )
         goto fail4;
 
@@ -3531,7 +3531,8 @@ long do_hvm_op(unsigned long op, XEN_GUE
                 for_each_vcpu ( d, v )
                 {
                     int old_port, new_port;
-                    new_port = alloc_unbound_xen_event_channel(v, a.value);
+                    new_port = alloc_unbound_xen_event_channel(
+                        v, a.value, NULL);
                     if ( new_port < 0 )
                     {
                         rc = new_port;
diff -r ee909e5a9d85 -r b3cdfb5b76d0 xen/arch/x86/mm/mem_event.c
--- a/xen/arch/x86/mm/mem_event.c
+++ b/xen/arch/x86/mm/mem_event.c
@@ -96,7 +96,8 @@ static int mem_event_enable(struct domai
 
     /* Allocate event channel */
     rc = alloc_unbound_xen_event_channel(d->vcpu[0],
-                                         current->domain->domain_id);
+                                         current->domain->domain_id,
+                                         NULL);
     if ( rc < 0 )
         goto err;
 
diff -r ee909e5a9d85 -r b3cdfb5b76d0 xen/common/event_channel.c
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -57,6 +57,51 @@
         goto out;                                                   \
     } while ( 0 )
 
+#define consumer_is_xen(e) (!!(e)->xen_consumer)
+
+/*
+ * The function alloc_unbound_xen_event_channel() allows an arbitrary
+ * notifier function to be specified. However, very few unique functions
+ * are specified in practice, so to prevent bloating the evtchn structure
+ * with a pointer, we stash them dynamically in a small lookup array which
+ * can be indexed by a small integer.
+ */
+static xen_event_channel_notification_t xen_consumers[8];
+
+/* Default notification action: wake up from wait_on_xen_event_channel(). */
+static void default_xen_notification_fn(struct vcpu *v, unsigned int port)
+{
+    /* Consumer needs notification only if blocked. */
+    if ( test_and_clear_bit(_VPF_blocked_in_xen, &v->pause_flags) )
+        vcpu_wake(v);
+}
+
+/*
+ * Given a notification function, return the value to stash in
+ * the evtchn->xen_consumer field.
+ */
+static uint8_t get_xen_consumer(xen_event_channel_notification_t fn)
+{
+    unsigned int i;
+
+    if ( fn == NULL )
+        fn = default_xen_notification_fn;
+
+    for ( i = 0; i < ARRAY_SIZE(xen_consumers); i++ )
+    {
+        if ( xen_consumers[i] == NULL )
+            xen_consumers[i] = fn;
+        if ( xen_consumers[i] == fn )
+            break;
+    }
+
+    BUG_ON(i >= ARRAY_SIZE(xen_consumers));
+    return i+1;
+}
+
+/* Get the notification function for a given Xen-bound event channel. */
+#define xen_notification_fn(e) (xen_consumers[(e)->xen_consumer-1])
+
 static int evtchn_set_pending(struct vcpu *v, int port);
 
 static int virq_is_global(int virq)
@@ -395,7 +440,7 @@ static long __evtchn_close(struct domain
     chn1 = evtchn_from_port(d1, port1);
 
     /* Guest cannot close a Xen-attached event channel. */
-    if ( unlikely(chn1->consumer_is_xen) )
+    if ( unlikely(consumer_is_xen(chn1)) )
     {
         rc = -EINVAL;
         goto out;
@@ -533,7 +578,7 @@ int evtchn_send(struct domain *d, unsign
     lchn = evtchn_from_port(ld, lport);
 
     /* Guest cannot send via a Xen-attached event channel. */
-    if ( unlikely(lchn->consumer_is_xen) )
+    if ( unlikely(consumer_is_xen(lchn)) )
     {
         spin_unlock(&ld->event_lock);
         return -EINVAL;
@@ -550,13 +595,8 @@ int evtchn_send(struct domain *d, unsign
         rport = lchn->u.interdomain.remote_port;
         rchn  = evtchn_from_port(rd, rport);
         rvcpu = rd->vcpu[rchn->notify_vcpu_id];
-        if ( rchn->consumer_is_xen )
-        {
-            /* Xen consumers need notification only if they are blocked. */
-            if ( test_and_clear_bit(_VPF_blocked_in_xen,
-                                    &rvcpu->pause_flags) )
-                vcpu_wake(rvcpu);
-        }
+        if ( consumer_is_xen(rchn) )
+            (*xen_notification_fn(rchn))(rvcpu, rport);
         else
         {
             evtchn_set_pending(rvcpu, rport);
@@ -783,7 +823,7 @@ long evtchn_bind_vcpu(unsigned int port,
     chn = evtchn_from_port(d, port);
 
     /* Guest cannot re-bind a Xen-attached event channel. */
-    if ( unlikely(chn->consumer_is_xen) )
+    if ( unlikely(consumer_is_xen(chn)) )
     {
         rc = -EINVAL;
         goto out;
@@ -994,7 +1034,8 @@ long do_event_channel_op(int cmd, XEN_GU
 
 
 int alloc_unbound_xen_event_channel(
-    struct vcpu *local_vcpu, domid_t remote_domid)
+    struct vcpu *local_vcpu, domid_t remote_domid,
+    xen_event_channel_notification_t notification_fn)
 {
     struct evtchn *chn;
     struct domain *d = local_vcpu->domain;
@@ -1007,7 +1048,7 @@ int alloc_unbound_xen_event_channel(
     chn = evtchn_from_port(d, port);
 
     chn->state = ECS_UNBOUND;
-    chn->consumer_is_xen = 1;
+    chn->xen_consumer = get_xen_consumer(notification_fn);
     chn->notify_vcpu_id = local_vcpu->vcpu_id;
     chn->u.unbound.remote_domid = remote_domid;
 
@@ -1034,8 +1075,8 @@ void free_xen_event_channel(
 
     BUG_ON(!port_is_valid(d, port));
     chn = evtchn_from_port(d, port);
-    BUG_ON(!chn->consumer_is_xen);
-    chn->consumer_is_xen = 0;
+    BUG_ON(!consumer_is_xen(chn));
+    chn->xen_consumer = 0;
 
     spin_unlock(&d->event_lock);
 
@@ -1059,7 +1100,7 @@ void notify_via_xen_event_channel(struct
 
     ASSERT(port_is_valid(ld, lport));
     lchn = evtchn_from_port(ld, lport);
-    ASSERT(lchn->consumer_is_xen);
+    ASSERT(consumer_is_xen(lchn));
 
     if ( likely(lchn->state == ECS_INTERDOMAIN) )
     {
@@ -1102,7 +1143,7 @@ void evtchn_destroy(struct domain *d)
     /* Close all existing event channels. */
     for ( i = 0; port_is_valid(d, i); i++ )
     {
-        evtchn_from_port(d, i)->consumer_is_xen = 0;
+        evtchn_from_port(d, i)->xen_consumer = 0;
         (void)__evtchn_close(d, i);
     }
 
@@ -1188,7 +1229,7 @@ static void domain_dump_evtchn_info(stru
             printk(" v=%d", chn->u.virq);
             break;
         }
-        printk(" x=%d\n", chn->consumer_is_xen);
+        printk(" x=%d\n", chn->xen_consumer);
     }
 
     spin_unlock(&d->event_lock);
diff -r ee909e5a9d85 -r b3cdfb5b76d0 xen/include/xen/event.h
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -51,8 +51,11 @@ int evtchn_unmask(unsigned int port);
 void evtchn_move_pirqs(struct vcpu *v);
 
 /* Allocate/free a Xen-attached event channel port. */
+typedef void (*xen_event_channel_notification_t)(
+    struct vcpu *v, unsigned int port);
 int alloc_unbound_xen_event_channel(
-    struct vcpu *local_vcpu, domid_t remote_domid);
+    struct vcpu *local_vcpu, domid_t remote_domid,
+    xen_event_channel_notification_t notification_fn);
 void free_xen_event_channel(
     struct vcpu *local_vcpu, int port);
 
diff -r ee909e5a9d85 -r b3cdfb5b76d0 xen/include/xen/sched.h
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -47,7 +47,7 @@ struct evtchn
 #define ECS_VIRQ         5 /* Channel is bound to a virtual IRQ line.        */
 #define ECS_IPI          6 /* Channel is bound to a virtual IPI line.        */
     u8  state;             /* ECS_* */
-    u8  consumer_is_xen;   /* Consumed by Xen or by guest? */
+    u8  xen_consumer;      /* Consumer in Xen, if any? (0 = send to guest) */
     u16 notify_vcpu_id;    /* VCPU for local delivery notification */
     union {
         struct {

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar-Cavilla

2011-Nov-14 21:58 UTC

head link

[Xen-devel] [PATCH 3 of 4] Make the prototype of p2m_mem_access_resume consistent

xen/arch/x86/mm/mem_access.c |  3 +--
 xen/arch/x86/mm/p2m.c        |  3 +--
 xen/include/asm-x86/p2m.h    |  2 +-
 3 files changed, 3 insertions(+), 5 deletions(-)


Signed-off-by: Adin Scannell <adin@scannell.ca>
Signed-off-by: Keir Fraser <keir@xen.org>
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r b3cdfb5b76d0 -r 106296812c3f xen/arch/x86/mm/mem_access.c
--- a/xen/arch/x86/mm/mem_access.c
+++ b/xen/arch/x86/mm/mem_access.c
@@ -29,13 +29,12 @@ int mem_access_domctl(struct domain *d, 
                       XEN_GUEST_HANDLE(void) u_domctl)
 {
     int rc;
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
     switch( mec->op )
     {
     case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_RESUME:
     {
-        p2m_mem_access_resume(p2m);
+        p2m_mem_access_resume(d);
         rc = 0;
     }
     break;
diff -r b3cdfb5b76d0 -r 106296812c3f xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1150,9 +1150,8 @@ void p2m_mem_access_check(unsigned long 
     /* VCPU paused */
 }
 
-void p2m_mem_access_resume(struct p2m_domain *p2m)
+void p2m_mem_access_resume(struct domain *d)
 {
-    struct domain *d = p2m->domain;
     mem_event_response_t rsp;
 
     /* Pull all responses off the ring */
diff -r b3cdfb5b76d0 -r 106296812c3f xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -493,7 +493,7 @@ static inline void p2m_mem_paging_popula
 void p2m_mem_access_check(unsigned long gpa, bool_t gla_valid, unsigned long
gla,
                           bool_t access_r, bool_t access_w, bool_t access_x);
 /* Resumes the running of the VCPU, restarting the last instruction */
-void p2m_mem_access_resume(struct p2m_domain *p2m);
+void p2m_mem_access_resume(struct domain *d);
 
 /* Set access type for a region of pfns.
  * If start_pfn == -1ul, sets the default access type */

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar-Cavilla

2011-Nov-14 21:58 UTC

head link

[Xen-devel] [PATCH 4 of 4] Allow memevent responses to be signaled via the event channel

xen/arch/x86/mm/mem_event.c |  26 ++++++++++++++++++++------
 1 files changed, 20 insertions(+), 6 deletions(-)


Don''t require a separate domctl to notify the memevent interface that
an event
has occured.  This domctl can be taxing, particularly when you are scaling
events and paging to many domains across a single system.  Instead, we use the
existing event channel to signal when we place something in the ring (as per
normal ring operation).

Signed-off-by: Adin Scannell <adin@scannell.ca>
Signed-off-by: Keir Fraser <keir@xen.org>
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 106296812c3f -r 08a6ea63be98 xen/arch/x86/mm/mem_event.c
--- a/xen/arch/x86/mm/mem_event.c
+++ b/xen/arch/x86/mm/mem_event.c
@@ -37,9 +37,11 @@
 #define mem_event_ring_lock(_med)       spin_lock(&(_med)->ring_lock)
 #define mem_event_ring_unlock(_med)     spin_unlock(&(_med)->ring_lock)
 
-static int mem_event_enable(struct domain *d,
-                            xen_domctl_mem_event_op_t *mec,
-                            struct mem_event_domain *med)
+static int mem_event_enable(
+    struct domain *d,
+    xen_domctl_mem_event_op_t *mec,
+    struct mem_event_domain *med,
+    xen_event_channel_notification_t notification_fn)
 {
     int rc;
     struct domain *dom_mem_event = current->domain;
@@ -97,7 +99,7 @@ static int mem_event_enable(struct domai
     /* Allocate event channel */
     rc = alloc_unbound_xen_event_channel(d->vcpu[0],
                                          current->domain->domain_id,
-                                         NULL);
+                                         notification_fn);
     if ( rc < 0 )
         goto err;
 
@@ -308,6 +310,18 @@ int mem_event_check_ring(struct domain *
     return 0;
 }
 
+/* Registered with Xen-bound event channel for incoming notifications. */
+static void mem_paging_notification(struct vcpu *v, unsigned int port)
+{
+    p2m_mem_paging_resume(v->domain);
+}
+
+/* Registered with Xen-bound event channel for incoming notifications. */
+static void mem_access_notification(struct vcpu *v, unsigned int port)
+{
+    p2m_mem_access_resume(v->domain);
+}
+
 int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec,
                      XEN_GUEST_HANDLE(void) u_domctl)
 {
@@ -369,7 +383,7 @@ int mem_event_domctl(struct domain *d, x
             if ( p2m->pod.entry_count )
                 break;
 
-            rc = mem_event_enable(d, mec, med);
+            rc = mem_event_enable(d, mec, med, mem_paging_notification);
         }
         break;
 
@@ -408,7 +422,7 @@ int mem_event_domctl(struct domain *d, x
             if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
                 break;
 
-            rc = mem_event_enable(d, mec, med);
+            rc = mem_event_enable(d, mec, med, mem_access_notification);
         }
         break;
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Nov-23 18:35 UTC

head link

Re: [PATCH 1 of 4] Improve ring management for memory events

On Mon, Nov 14, Andres Lagar-Cavilla wrote:

I will do a more complete review later, a few comments below.

> +    free_requests = RING_FREE_REQUESTS(&med->front_ring);
> +    if ( unlikely(free_requests < d->max_vcpus) )
> +    {
> +        /* This may happen during normal operation (hopefully not often).
*/
> +        gdprintk(XENLOG_INFO, "mem_event request slots for domain %d:
%d\n",
> +                               d->domain_id, free_requests);
What is the purpose of this printk? The ring can and does fillup quickly
with paging.

> +    if ( mem_event_ring_free(d, med) == 0 )
> +    {
> +        /* This *may* happen, but only when foreign mappings generate
events. */
> +        gdprintk(XENLOG_WARNING, "possible lost mem_event for domain
%d\n",
> +                                 d->domain_id);
This printk is not needed, callers have to try again.

Olaf

Andres Lagar-Cavilla

2011-Nov-23 18:52 UTC

head link

Re: [PATCH 1 of 4] Improve ring management for memory events

Olaf,> On Mon, Nov 14, Andres Lagar-Cavilla wrote:
>
> I will do a more complete review later, a few comments below.
>
>
>> +    free_requests = RING_FREE_REQUESTS(&med->front_ring);
>> +    if ( unlikely(free_requests < d->max_vcpus) )
>> +    {
>> +        /* This may happen during normal operation (hopefully not
>> often). */
>> +        gdprintk(XENLOG_INFO, "mem_event request slots for domain
%d:
>> %d\n",
>> +                               d->domain_id, free_requests);
>
> What is the purpose of this printk? The ring can and does fillup quickly
> with paging.
Well, we can tone down printk''s to be debug level. I don''t
think they''re
unnecessary if they''re made an optional debug tool.

Question: I have one vcpu, how do I fill up the ring quickly? (outside of
foreign mappings)

Andres>
>
>> +    if ( mem_event_ring_free(d, med) == 0 )
>> +    {
>> +        /* This *may* happen, but only when foreign mappings generate
>> events. */
>> +        gdprintk(XENLOG_WARNING, "possible lost mem_event for
domain
>> %d\n",
>> +                                 d->domain_id);
>
> This printk is not needed, callers have to try again.
>
> Olaf
>

Olaf Hering

2011-Nov-23 18:57 UTC

head link

Re: [PATCH 1 of 4] Improve ring management for memory events

On Wed, Nov 23, Andres Lagar-Cavilla wrote:
> Well, we can tone down printk''s to be debug level. I
don''t think they''re
> unnecessary if they''re made an optional debug tool.
There is nothing to debug here, since the callers have to retry anyway.
> Question: I have one vcpu, how do I fill up the ring quickly? (outside of
> foreign mappings)
Have a balloon driver in the guest and balloon down more than
64*PAGE_SIZE. This is the default at least in my setup where the kernel
driver releases some memory right away (I havent checked where this is
actually configured).

Olaf

Andres Lagar-Cavilla

2011-Nov-24 18:54 UTC

head link

Re: [PATCH 1 of 4] Improve ring management for memory events

> On Wed, Nov 23, Andres Lagar-Cavilla wrote:
>
>> Well, we can tone down printk''s to be debug level. I
don''t think they''re
>> unnecessary if they''re made an optional debug tool.
>
> There is nothing to debug here, since the callers have to retry anyway.
>
>> Question: I have one vcpu, how do I fill up the ring quickly? (outside
>> of
>> foreign mappings)
>
> Have a balloon driver in the guest and balloon down more than
> 64*PAGE_SIZE. This is the default at least in my setup where the kernel
> driver releases some memory right away (I havent checked where this is
> actually configured).
I see, a guest can call decrease_reservation with an extent_order large
enough that it will overflow the ring. No matter the size of the ring.
Isn''t preemption of this hypercall a better tactic than putting the
vcpu
on a wait-queue? This won''t preclude the need for wait queues, but it
feels like a much cleaner solution.

With retrying of foreign mappings in xc_map_foreign_bulk (and grants), I
wonder if we should put events in the ring due to foreign mappings *at
all*, in the case of congestion. Eventually a retry will get to kick the
pager.

Andres
>
> Olaf
>

Olaf Hering

2011-Nov-24 19:23 UTC

head link

Re: [PATCH 1 of 4] Improve ring management for memory events

On Thu, Nov 24, Andres Lagar-Cavilla wrote:
> > On Wed, Nov 23, Andres Lagar-Cavilla wrote:
> >
> >> Well, we can tone down printk''s to be debug level. I
don''t think they''re
> >> unnecessary if they''re made an optional debug tool.
> >
> > There is nothing to debug here, since the callers have to retry
anyway.
> >
> >> Question: I have one vcpu, how do I fill up the ring quickly?
(outside
> >> of
> >> foreign mappings)
> >
> > Have a balloon driver in the guest and balloon down more than
> > 64*PAGE_SIZE. This is the default at least in my setup where the
kernel
> > driver releases some memory right away (I havent checked where this is
> > actually configured).
> 
> I see, a guest can call decrease_reservation with an extent_order large
> enough that it will overflow the ring. No matter the size of the ring.
> Isn''t preemption of this hypercall a better tactic than putting
the vcpu
> on a wait-queue? This won''t preclude the need for wait queues, but
it
> feels like a much cleaner solution.
Yes, yesterday I was thinking about this as well.
p2m_mem_paging_drop_page() should return -EBUSY. But currently not all
callers of guest_remove_page() look at the exit code. Perhaps that can
be fixed.
> With retrying of foreign mappings in xc_map_foreign_bulk (and grants), I
> wonder if we should put events in the ring due to foreign mappings *at
> all*, in the case of congestion. Eventually a retry will get to kick the
> pager.
What do you mean with that?

Olaf

Andres Lagar-Cavilla

2011-Nov-24 19:35 UTC

head link

Re: [PATCH 1 of 4] Improve ring management for memory events

> On Thu, Nov 24, Andres Lagar-Cavilla wrote:
>
>> > On Wed, Nov 23, Andres Lagar-Cavilla wrote:
>> >
>> >> Well, we can tone down printk''s to be debug level. I
don''t think
>> they''re
>> >> unnecessary if they''re made an optional debug tool.
>> >
>> > There is nothing to debug here, since the callers have to retry
>> anyway.
>> >
>> >> Question: I have one vcpu, how do I fill up the ring quickly?
>> (outside
>> >> of
>> >> foreign mappings)
>> >
>> > Have a balloon driver in the guest and balloon down more than
>> > 64*PAGE_SIZE. This is the default at least in my setup where the
>> kernel
>> > driver releases some memory right away (I havent checked where
this is
>> > actually configured).
>>
>> I see, a guest can call decrease_reservation with an extent_order large
>> enough that it will overflow the ring. No matter the size of the ring.
>> Isn''t preemption of this hypercall a better tactic than
putting the vcpu
>> on a wait-queue? This won''t preclude the need for wait queues,
but it
>> feels like a much cleaner solution.
>
> Yes, yesterday I was thinking about this as well.
> p2m_mem_paging_drop_page() should return -EBUSY. But currently not all
> callers of guest_remove_page() look at the exit code. Perhaps that can
> be fixed.
>
>> With retrying of foreign mappings in xc_map_foreign_bulk (and grants),
I
>> wonder if we should put events in the ring due to foreign mappings *at
>> all*, in the case of congestion. Eventually a retry will get to kick
the
>> pager.
>
> What do you mean with that?
Thinking out loud here. In our ring management patch, we put a guest vcpu
to sleep if space in the ring < d->max_vcpus.

In the case of a foreign mapping vcpu, we still allow the vcpu to put an
event in the ring, as long as there is any space. This can eventually fill
up the ring and prevent guest vcpus from placing events.

However, we could prevent this latter behaviour if it will result in a
condition in which
space_in_the_ring < (d->max_vcpus - ring->blocked)

This will ensure no event caused by a guest vcpu will ever be lost (we
still need the preemption of decrease_reservation, though).

Correctness is preserved for the foreign mapping vcpu. It will retry its
mapping and eventually there will be space in the ring.

With this, we won''t need wait-queues for ring management.

Makes sense?
Andres
>
> Olaf
>

Xen devel - Nov 2011 - [PATCH 0 of 4] Mem event handling improvements

[Xen-devel] [PATCH 0 of 4] Mem event handling improvements

[Xen-devel] [PATCH 1 of 4] Improve ring management for memory events

[Xen-devel] [PATCH 2 of 4] Create a generic callback mechanism for Xen-bound event channels

[Xen-devel] [PATCH 3 of 4] Make the prototype of p2m_mem_access_resume consistent

[Xen-devel] [PATCH 4 of 4] Allow memevent responses to be signaled via the event channel

Re: [PATCH 1 of 4] Improve ring management for memory events

Re: [PATCH 1 of 4] Improve ring management for memory events

Re: [PATCH 1 of 4] Improve ring management for memory events

Re: [PATCH 1 of 4] Improve ring management for memory events

Re: [PATCH 1 of 4] Improve ring management for memory events

Re: [PATCH 1 of 4] Improve ring management for memory events