thr3ads.net - Xen devel - [PATCH 0 of 4] RFC: wait queue usage [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Olaf Hering

2011-Dec-01 11:09 UTC

[PATCH 0 of 4] RFC: wait queue usage

The following series is my current state of an attempt to use wait queues for
xenpaging and mem_event handling. I post it here for review and comments.

This series conflicts with work from Andres Lagar-Cavilla.


Olaf


 xen/arch/x86/hvm/emulate.c       |    3 
 xen/arch/x86/hvm/hvm.c           |   21 +-
 xen/arch/x86/mm.c                |   49 +----
 xen/arch/x86/mm/guest_walk.c     |    3 
 xen/arch/x86/mm/hap/guest_walk.c |    6 
 xen/arch/x86/mm/mem_event.c      |  150 +++++++++++++--
 xen/arch/x86/mm/mem_sharing.c    |   27 +-
 xen/arch/x86/mm/p2m-ept.c        |    3 
 xen/arch/x86/mm/p2m.c            |  378 +++++++++++++++++++++++++++++++++------
 xen/common/domctl.c              |    3 
 xen/common/grant_table.c         |    3 
 xen/include/asm-x86/hvm/domain.h |    3 
 xen/include/asm-x86/mem_event.h  |    8 
 xen/include/asm-x86/p2m.h        |   11 -
 xen/include/public/trace.h       |   31 +++
 xen/include/xen/sched.h          |   16 +
 16 files changed, 569 insertions(+), 146 deletions(-)

Olaf Hering

2011-Dec-01 11:09 UTC

head link

[PATCH 1 of 4] mem_event: use wait queue when ring is full

# HG changeset patch
# User Olaf Hering <olaf@aepfle.de>
# Date 1322736732 -3600
# Node ID 612f69531fd15cf59c58404f6e4762733a9a268c
# Parent  3c4c29899d8a2a0c0f9109b7236da95bb72b77b6
mem_event: use wait queue when ring is full

This change is based on an idea/patch from Adin Scannell.

If the ring is full, put the current vcpu to sleep if it belongs to the
target domain. The wakeup happens in the p2m_*_resume functions. Wakeup
will take the number of free slots into account.

A request from another domain will use the existing ->req_producers
functionality because sleeping is not possible if the vcpu does not
belong to the target domain.

This change fixes also a bug in p2m_mem_paging_drop_page(). Up to now a
full ring will lead to harmless inconsistency in the pager.

v4:
 - fix off-by-one bug in _mem_event_put_request
 - add mem_event_wake_requesters() and use wake_up_nr()
 - rename mem_event_mark_and_pause() and mem_event_mark_and_pause() functions
 - req_producers counts foreign request producers, rename member

v3:
 - rename ->mem_event_bit to ->bit
 - remove me_ from new VPF_ defines

v2:
 - p2m_mem_paging_populate: move vcpu_pause after put_request, otherwise the
   vcpu will not wake_up after a wait_event because the pause_count was
   increased twice. Fixes guest hangs.
 - update free space check in _mem_event_put_request()
 - simplify mem_event_put_request()

Signed-off-by: Olaf Hering <olaf@aepfle.de>

diff -r 3c4c29899d8a -r 612f69531fd1 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4107,7 +4107,7 @@ static int hvm_memory_event_traps(long p
         return 1;
     
     rc = mem_event_check_ring(d, &d->mem_event->access);
-    if ( rc )
+    if ( rc < 0 )
         return rc;
     
     memset(&req, 0, sizeof(req));
diff -r 3c4c29899d8a -r 612f69531fd1 xen/arch/x86/mm/mem_event.c
--- a/xen/arch/x86/mm/mem_event.c
+++ b/xen/arch/x86/mm/mem_event.c
@@ -23,6 +23,7 @@
 
 #include <asm/domain.h>
 #include <xen/event.h>
+#include <xen/wait.h>
 #include <asm/p2m.h>
 #include <asm/mem_event.h>
 #include <asm/mem_paging.h>
@@ -39,6 +40,7 @@
 
 static int mem_event_enable(struct domain *d,
                             xen_domctl_mem_event_op_t *mec,
+                            int bit,
                             struct mem_event_domain *med)
 {
     int rc;
@@ -107,8 +109,12 @@ static int mem_event_enable(struct domai
 
     mem_event_ring_lock_init(med);
 
+    med->bit = bit;
+
+    init_waitqueue_head(&med->wq);
+
     /* Wake any VCPUs paused for memory events */
-    mem_event_unpause_vcpus(d);
+    mem_event_wake_waiters(d, med);
 
     return 0;
 
@@ -124,6 +130,9 @@ static int mem_event_enable(struct domai
 
 static int mem_event_disable(struct mem_event_domain *med)
 {
+    if (!list_empty(&med->wq.list))
+        return -EBUSY;
+
     unmap_domain_page(med->ring_page);
     med->ring_page = NULL;
 
@@ -133,13 +142,23 @@ static int mem_event_disable(struct mem_
     return 0;
 }
 
-void mem_event_put_request(struct domain *d, struct mem_event_domain *med,
mem_event_request_t *req)
+static int _mem_event_put_request(struct domain *d,
+                                  struct mem_event_domain *med,
+                                  mem_event_request_t *req)
 {
     mem_event_front_ring_t *front_ring;
+    int free_requests;
     RING_IDX req_prod;
 
     mem_event_ring_lock(med);
 
+    free_requests = RING_FREE_REQUESTS(&med->front_ring);
+    /* Requests from foreign domain were claimed in mem_event_check_ring() */
+    if ((current->domain == d && free_requests <
med->foreign_producers) || !free_requests) {
+        mem_event_ring_unlock(med);
+        return 0;
+    }
+
     front_ring = &med->front_ring;
     req_prod = front_ring->req_prod_pvt;
 
@@ -148,13 +167,30 @@ void mem_event_put_request(struct domain
     req_prod++;
 
     /* Update ring */
-    med->req_producers--;
+    if (current->domain != d)
+        med->foreign_producers--;
     front_ring->req_prod_pvt = req_prod;
     RING_PUSH_REQUESTS(front_ring);
 
     mem_event_ring_unlock(med);
 
     notify_via_xen_event_channel(d, med->xen_port);
+
+    return 1;
+}
+
+void mem_event_put_request(struct domain *d, struct mem_event_domain *med,
+                           mem_event_request_t *req)
+{
+    /* Go to sleep if request came from guest */
+    if (current->domain == d) {
+        wait_event(med->wq, _mem_event_put_request(d, med, req));
+        return;
+    }
+    /* Ring was full anyway, unable to sleep in non-guest context */
+    if (!_mem_event_put_request(d, med, req))
+        printk("Failed to put memreq: d %u t %x f %x gfn %lx\n",
d->domain_id,
+                req->type, req->flags, (unsigned long)req->gfn);
 }
 
 void mem_event_get_response(struct mem_event_domain *med, mem_event_response_t
*rsp)
@@ -178,31 +214,95 @@ void mem_event_get_response(struct mem_e
     mem_event_ring_unlock(med);
 }
 
-void mem_event_unpause_vcpus(struct domain *d)
+/**
+ * mem_event_wake_requesters - Wake vcpus waiting for room in the ring
+ * @d: guest domain
+ * @med: mem_event ring
+ *
+ * mem_event_wake_requesters() will wakeup vcpus waiting for room in the
+ * ring. Only as many as can place another request in the ring will
+ * resume execution.
+ */
+void mem_event_wake_requesters(struct mem_event_domain *med)
+{
+    int free_requests;
+
+    mem_event_ring_lock(med);
+
+    free_requests = RING_FREE_REQUESTS(&med->front_ring);
+    if ( free_requests )
+        wake_up_nr(&med->wq, free_requests);
+
+    mem_event_ring_unlock(med);
+}
+
+/**
+ * mem_event_wake_waiters - Wake all vcpus waiting for the ring
+ * @d: guest domain
+ * @med: mem_event ring
+ *
+ * mem_event_wake_waiters() will wakeup all vcpus waiting for the ring to 
+ * become available.
+ */
+void mem_event_wake_waiters(struct domain *d, struct mem_event_domain *med)
 {
     struct vcpu *v;
 
     for_each_vcpu ( d, v )
-        if ( test_and_clear_bit(_VPF_mem_event, &v->pause_flags) )
+        if ( test_and_clear_bit(med->bit, &v->pause_flags) )
             vcpu_wake(v);
 }
 
-void mem_event_mark_and_pause(struct vcpu *v)
+/**
+ * mem_event_mark_and_sleep - Put vcpu to sleep
+ * @v: guest vcpu
+ * @med: mem_event ring
+ *
+ * mem_event_mark_and_sleep() tags vcpu and put it to sleep.
+ * The vcpu will resume execution in mem_event_wake_waiters().
+ */
+void mem_event_mark_and_sleep(struct vcpu *v, struct mem_event_domain *med)
 {
-    set_bit(_VPF_mem_event, &v->pause_flags);
+    set_bit(med->bit, &v->pause_flags);
     vcpu_sleep_nosync(v);
 }
 
-void mem_event_put_req_producers(struct mem_event_domain *med)
+/**
+ * mem_event_put_req_producers - Release a claimed slot
+ * @med: mem_event ring
+ *
+ * mem_event_put_req_producers() releases a claimed slot in the mem_event ring.
+ */
+void mem_event_put_req_producers(struct domain *d, struct mem_event_domain
*med)
 {
+    if ( current->domain == d )
+        return;
+
     mem_event_ring_lock(med);
-    med->req_producers--;
+    med->foreign_producers--;
     mem_event_ring_unlock(med);
 }
 
+/**
+ * mem_event_check_ring - Check state of a mem_event ring
+ * @d: guest domain
+ * @med: mem_event ring
+ *
+ * Return codes: < 0: the ring is not yet configured
+ *                 0: the ring has some room
+ *               > 0: the ring is full
+ *
+ * mem_event_check_ring() checks the state of the given mem_event ring.
+ * If the current vcpu belongs to the guest domain, the function assumes that
+ * mem_event_put_request() will sleep until the ring has room again.
+ *
+ * If the current vcpu does not belong to the target domain the caller must try
+ * again until there is room. A slot is claimed and the caller can place a
+ * request. If the caller does not need to send a request, the claimed slot has
+ * to be released with mem_event_put_req_producers().
+ */
 int mem_event_check_ring(struct domain *d, struct mem_event_domain *med)
 {
-    struct vcpu *curr = current;
     int free_requests;
     int ring_full = 1;
 
@@ -212,15 +312,15 @@ int mem_event_check_ring(struct domain *
     mem_event_ring_lock(med);
 
     free_requests = RING_FREE_REQUESTS(&med->front_ring);
-    if ( med->req_producers < free_requests )
+
+    if ( current->domain == d )
+        ring_full = 0;
+    else if ( med->foreign_producers < free_requests )
     {
-        med->req_producers++;
+        med->foreign_producers++;
         ring_full = 0;
     }
 
-    if ( ring_full && (curr->domain == d) )
-        mem_event_mark_and_pause(curr);
-
     mem_event_ring_unlock(med);
 
     return ring_full;
@@ -287,7 +387,7 @@ int mem_event_domctl(struct domain *d, x
             if ( p2m->pod.entry_count )
                 break;
 
-            rc = mem_event_enable(d, mec, med);
+            rc = mem_event_enable(d, mec, _VPF_mem_paging, med);
         }
         break;
 
@@ -326,7 +426,7 @@ int mem_event_domctl(struct domain *d, x
             if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
                 break;
 
-            rc = mem_event_enable(d, mec, med);
+            rc = mem_event_enable(d, mec, _VPF_mem_access, med);
         }
         break;
 
diff -r 3c4c29899d8a -r 612f69531fd1 xen/arch/x86/mm/mem_sharing.c
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -253,19 +253,11 @@ static void mem_sharing_audit(void)
 #endif
 
 
-static struct page_info* mem_sharing_alloc_page(struct domain *d, 
-                                                unsigned long gfn)
+static void mem_sharing_notify_helper(struct domain *d, unsigned long gfn)
 {
-    struct page_info* page;
     struct vcpu *v = current;
     mem_event_request_t req;
 
-    page = alloc_domheap_page(d, 0); 
-    if(page != NULL) return page;
-
-    memset(&req, 0, sizeof(req));
-    req.type = MEM_EVENT_TYPE_SHARED;
-
     if ( v->domain != d )
     {
         /* XXX This path needs some attention.  For now, just fail foreign 
@@ -275,20 +267,20 @@ static struct page_info* mem_sharing_all
         gdprintk(XENLOG_ERR, 
                  "Failed alloc on unshare path for foreign (%d)
lookup\n",
                  d->domain_id);
-        return page;
+        return;
     }
 
-    vcpu_pause_nosync(v);
-    req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED;
+    if (mem_event_check_ring(d, &d->mem_event->share) < 0)
+        return;
 
-    if(mem_event_check_ring(d, &d->mem_event->share)) return page;
-
+    memset(&req, 0, sizeof(req));
+    req.type = MEM_EVENT_TYPE_SHARED;
+    req.flags = MEM_EVENT_FLAG_VCPU_PAUSED;
     req.gfn = gfn;
     req.p2mt = p2m_ram_shared;
     req.vcpu_id = v->vcpu_id;
     mem_event_put_request(d, &d->mem_event->share, &req);
-
-    return page;
+    vcpu_pause_nosync(v);
 }
 
 unsigned int mem_sharing_get_nr_saved_mfns(void)
@@ -653,7 +645,7 @@ gfn_found:
     if(ret == 0) goto private_page_found;
         
     old_page = page;
-    page = mem_sharing_alloc_page(d, gfn);
+    page = alloc_domheap_page(d, 0); 
     if(!page) 
     {
         /* We''ve failed to obtain memory for private page. Need to
re-add the
@@ -661,6 +653,7 @@ gfn_found:
         list_add(&gfn_info->list, &hash_entry->gfns);
         put_gfn(d, gfn);
         shr_unlock();
+        mem_sharing_notify_helper(d, gfn);
         return -ENOMEM;
     }
 
diff -r 3c4c29899d8a -r 612f69531fd1 xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -882,20 +882,12 @@ int p2m_mem_paging_evict(struct domain *
  */
 void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn)
 {
-    struct vcpu *v = current;
-    mem_event_request_t req;
+    mem_event_request_t req = { .type = MEM_EVENT_TYPE_PAGING, .gfn = gfn };
 
-    /* Check that there''s space on the ring for this request */
-    if ( mem_event_check_ring(d, &d->mem_event->paging) == 0)
-    {
-        /* Send release notification to pager */
-        memset(&req, 0, sizeof(req));
-        req.flags |= MEM_EVENT_FLAG_DROP_PAGE;
-        req.gfn = gfn;
-        req.vcpu_id = v->vcpu_id;
+    /* Send release notification to pager */
+    req.flags = MEM_EVENT_FLAG_DROP_PAGE;
 
-        mem_event_put_request(d, &d->mem_event->paging, &req);
-    }
+    mem_event_put_request(d, &d->mem_event->paging, &req);
 }
 
 /**
@@ -960,7 +952,7 @@ void p2m_mem_paging_populate(struct doma
     else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged )
     {
         /* gfn is already on its way back and vcpu is not paused */
-        mem_event_put_req_producers(&d->mem_event->paging);
+        mem_event_put_req_producers(d, &d->mem_event->paging);
         return;
     }
 
@@ -1074,8 +1066,8 @@ void p2m_mem_paging_resume(struct domain
     if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
         vcpu_unpause(d->vcpu[rsp.vcpu_id]);
 
-    /* Unpause any domains that were paused because the ring was full */
-    mem_event_unpause_vcpus(d);
+    /* Wake vcpus waiting for room in the ring */
+    mem_event_wake_requesters(&d->mem_event->paging);
 }
 
 void p2m_mem_access_check(unsigned long gpa, bool_t gla_valid, unsigned long
gla,
@@ -1114,7 +1106,7 @@ void p2m_mem_access_check(unsigned long 
                    "Memory access permissions failure, no mem_event
listener: pausing VCPU %d, dom %d\n",
                    v->vcpu_id, d->domain_id);
 
-            mem_event_mark_and_pause(v);
+            mem_event_mark_and_sleep(v, &d->mem_event->access);
         }
         else
         {
@@ -1135,7 +1127,7 @@ void p2m_mem_access_check(unsigned long 
 
     /* Pause the current VCPU unconditionally */
     vcpu_pause_nosync(v);
-    req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED;    
+    req.flags = MEM_EVENT_FLAG_VCPU_PAUSED;    
 
     /* Send request to mem event */
     req.gfn = gfn;
@@ -1164,9 +1156,11 @@ void p2m_mem_access_resume(struct p2m_do
     if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
         vcpu_unpause(d->vcpu[rsp.vcpu_id]);
 
-    /* Unpause any domains that were paused because the ring was full or no
listener
-     * was available */
-    mem_event_unpause_vcpus(d);
+    /* Wake vcpus waiting for room in the ring */
+    mem_event_wake_requesters(&d->mem_event->access);
+
+    /* Unpause all vcpus that were paused because no listener was available */
+    mem_event_wake_waiters(d, &d->mem_event->access);
 }
 
 
diff -r 3c4c29899d8a -r 612f69531fd1 xen/include/asm-x86/mem_event.h
--- a/xen/include/asm-x86/mem_event.h
+++ b/xen/include/asm-x86/mem_event.h
@@ -24,13 +24,13 @@
 #ifndef __MEM_EVENT_H__
 #define __MEM_EVENT_H__
 
-/* Pauses VCPU while marking pause flag for mem event */
-void mem_event_mark_and_pause(struct vcpu *v);
 int mem_event_check_ring(struct domain *d, struct mem_event_domain *med);
-void mem_event_put_req_producers(struct mem_event_domain *med);
+void mem_event_put_req_producers(struct domain *d, struct mem_event_domain
*med);
 void mem_event_put_request(struct domain *d, struct mem_event_domain *med,
mem_event_request_t *req);
 void mem_event_get_response(struct mem_event_domain *med, mem_event_response_t
*rsp);
-void mem_event_unpause_vcpus(struct domain *d);
+void mem_event_wake_requesters(struct mem_event_domain *med);
+void mem_event_wake_waiters(struct domain *d, struct mem_event_domain *med);
+void mem_event_mark_and_sleep(struct vcpu *v, struct mem_event_domain *med);
 
 int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec,
                      XEN_GUEST_HANDLE(void) u_domctl);
diff -r 3c4c29899d8a -r 612f69531fd1 xen/include/xen/sched.h
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -14,6 +14,7 @@
 #include <xen/nodemask.h>
 #include <xen/radix-tree.h>
 #include <xen/multicall.h>
+#include <xen/wait.h>
 #include <public/xen.h>
 #include <public/domctl.h>
 #include <public/sysctl.h>
@@ -183,7 +184,7 @@ struct mem_event_domain
 {
     /* ring lock */
     spinlock_t ring_lock;
-    unsigned int req_producers;
+    unsigned int foreign_producers;
     /* shared page */
     mem_event_shared_page_t *shared_page;
     /* shared ring page */
@@ -192,6 +193,10 @@ struct mem_event_domain
     mem_event_front_ring_t front_ring;
     /* event channel port (vcpu0 only) */
     int xen_port;
+    /* mem_event bit for vcpu->pause_flags */
+    int bit;
+    /* list of vcpus waiting for room in the ring */
+    struct waitqueue_head wq;
 };
 
 struct mem_event_per_domain
@@ -615,9 +620,12 @@ static inline struct domain *next_domain
  /* VCPU affinity has changed: migrating to a new CPU. */
 #define _VPF_migrating       3
 #define VPF_migrating        (1UL<<_VPF_migrating)
- /* VCPU is blocked on memory-event ring. */
-#define _VPF_mem_event       4
-#define VPF_mem_event        (1UL<<_VPF_mem_event)
+ /* VCPU is blocked due to missing mem_paging ring. */
+#define _VPF_mem_paging      4
+#define VPF_mem_paging       (1UL<<_VPF_mem_paging)
+ /* VCPU is blocked due to missing mem_access ring. */
+#define _VPF_mem_access      5
+#define VPF_mem_access       (1UL<<_VPF_mem_access)
 
 static inline int vcpu_runnable(struct vcpu *v)
 {

Olaf Hering

2011-Dec-01 11:09 UTC

head link

[PATCH 2 of 4] xenpaging: use wait queues

# HG changeset patch
# User Olaf Hering <olaf@aepfle.de>
# Date 1322737507 -3600
# Node ID 8147822efdee65d1f5b94656ab2032aedb76979f
# Parent  612f69531fd15cf59c58404f6e4762733a9a268c
xenpaging: use wait queues

Use a wait queue to put a guest vcpu to sleep while the requested gfn is
in paging state. This adds missing p2m_mem_paging_populate() calls to
some callers of the new get_gfn* variants, which would crash now
because they get an invalid mfn. It also fixes guest crashes due to
unexpected returns from do_memory_op because copy_to/from_guest ran into
a paged gfn. Now those places will always get a valid mfn.

Since each gfn could be requested by several guest vcpus at the same
time a queue of paged gfns is maintained. Each vcpu will be attached to
that queue. Once p2m_mem_paging_resume restored the gfn the waiting
vcpus will resume execution.

There is untested code in p2m_mem_paging_init_queue() to allow cpu
hotplug. Since each vcpu may wait on a different gfn there have to be as
many queues as vcpus. But xl vcpu-set does not seem to work right now,
so this code path cant be excercised right now.


Signed-off-by: Olaf Hering <olaf@aepfle.de>

diff -r 612f69531fd1 -r 8147822efdee xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -454,6 +454,8 @@ int hvm_domain_initialise(struct domain 
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
 
+    spin_lock_init(&d->arch.hvm_domain.gfn_lock);
+
     INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
     spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock);
 
diff -r 612f69531fd1 -r 8147822efdee xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -30,6 +30,7 @@
 #include <asm/p2m.h>
 #include <asm/hvm/vmx/vmx.h> /* ept_p2m_init() */
 #include <xen/iommu.h>
+#include <xen/wait.h>
 #include <asm/mem_event.h>
 #include <public/mem_event.h>
 #include <asm/mem_sharing.h>
@@ -144,6 +145,182 @@ void p2m_change_entry_type_global(struct
     p2m_unlock(p2m);
 }
 
+#ifdef __x86_64__
+struct p2m_mem_paging_queue {
+    struct list_head list;
+    struct waitqueue_head wq;
+    unsigned long gfn;
+    unsigned short waiters;
+    unsigned short woken;
+    unsigned short index;
+};
+
+struct p2m_mem_paging_queue_head {
+    struct list_head list;
+    unsigned int max;
+};
+
+int p2m_mem_paging_init_queue(struct domain *d, unsigned int max)
+{
+    struct p2m_mem_paging_queue_head *h;
+    struct p2m_mem_paging_queue *q;
+    unsigned int i, nr;
+    int ret = 0;
+
+    spin_lock(&d->arch.hvm_domain.gfn_lock);
+
+    if (!d->arch.hvm_domain.gfn_queue) {
+        ret = -ENOMEM;
+        h = xzalloc(struct p2m_mem_paging_queue_head);
+        if (!h) {
+            domain_crash(d);
+            goto out;
+        }
+
+        INIT_LIST_HEAD(&h->list);
+        nr = max;
+    } else {
+        h = d->arch.hvm_domain.gfn_queue;
+        if (max <= h->max)
+            goto out;
+        nr = max - h->max;
+    }
+
+    ret = -ENOMEM;
+    q = xzalloc_array(struct p2m_mem_paging_queue, nr);
+    if (!q) {
+        if (!d->arch.hvm_domain.gfn_queue)
+            xfree(h);
+        domain_crash(d);
+        goto out;
+    }
+
+    for (i = 0; i < nr; i++) {
+        init_waitqueue_head(&q[i].wq);
+        INIT_LIST_HEAD(&q[i].list);
+        q[i].index = h->max + i + 1;
+        list_add_tail(&q[i].list, &h->list);
+    }
+
+    h->max = max;
+    d->arch.hvm_domain.gfn_queue = h;
+    ret = 0;
+
+out:
+    spin_unlock(&d->arch.hvm_domain.gfn_lock);
+    return ret;
+}
+
+static struct p2m_mem_paging_queue *p2m_mem_paging_get_queue(struct domain *d,
unsigned long gfn)
+{
+    struct p2m_mem_paging_queue_head *h;
+    struct p2m_mem_paging_queue *q, *q_match, *q_free;
+    
+    h = d->arch.hvm_domain.gfn_queue;
+    q_match = q_free = NULL;
+
+    spin_lock(&d->arch.hvm_domain.gfn_lock);
+
+    list_for_each_entry(q, &h->list, list) {
+        if (q->gfn == gfn) {
+            q_match = q;
+            break;
+        }
+        if (!q_free && !q->waiters)
+            q_free = q;
+    }
+
+    if (!q_match && q_free)
+        q_match = q_free;
+
+    if (q_match) {
+        if (q_match->woken)
+            printk("wq woken for gfn %u:%u %lx %u %u %u\n",
current->domain->domain_id, current->vcpu_id, gfn, q_match->index,
q_match->woken, q_match->waiters);
+        q_match->waiters++;
+        q_match->gfn = gfn;
+    }
+
+    if (!q_match)
+        printk("No wq_get for gfn %u:%u %lx\n",
current->domain->domain_id, current->vcpu_id, gfn);
+
+    spin_unlock(&d->arch.hvm_domain.gfn_lock);
+    return q_match;
+}
+
+static void p2m_mem_paging_put_queue(struct domain *d, struct
p2m_mem_paging_queue *q_match)
+{
+    spin_lock(&d->arch.hvm_domain.gfn_lock);
+
+    if (q_match->waiters == 0)
+        printk("wq_put no waiters, gfn %u:%u %lx %u\n",
current->domain->domain_id, current->vcpu_id, q_match->gfn,
q_match->woken);
+    else if (--q_match->waiters == 0)
+        q_match->gfn = q_match->woken = 0;;
+
+    spin_unlock(&d->arch.hvm_domain.gfn_lock);
+}
+
+static void p2m_mem_paging_wake_queue(struct domain *d, unsigned long gfn)
+{
+    struct p2m_mem_paging_queue_head *h;
+    struct p2m_mem_paging_queue *q, *q_match = NULL;
+
+    spin_lock(&d->arch.hvm_domain.gfn_lock);
+
+    h = d->arch.hvm_domain.gfn_queue;
+    list_for_each_entry(q, &h->list, list) {
+        if (q->gfn == gfn) {
+            q_match = q;
+            break;
+        }
+    }
+    if (q_match) {
+        if (q_match->woken || q_match->waiters == 0)
+            printk("Wrong wake for gfn %u:%u %p %lx %u %u\n",
current->domain->domain_id, current->vcpu_id, q_match, gfn,
q_match->woken, q_match->waiters);
+        q_match->woken++;
+        wake_up_all(&q_match->wq);
+    }
+    spin_unlock(&d->arch.hvm_domain.gfn_lock);
+}
+
+/* Returns 0 if the gfn is still paged */
+static int p2m_mem_paging_get_entry(mfn_t *mfn,
+               struct p2m_domain *p2m, unsigned long gfn, 
+               p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
+               unsigned int *page_order)
+{
+    *mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order);
+
+    return p2m_is_paging(*t) ? 0 : 1;
+}
+
+/* Go to sleep in case of guest access */
+static void p2m_mem_paging_wait(mfn_t *mfn,
+                    struct p2m_domain *p2m, unsigned long gfn,
+                    p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
+                    unsigned int *page_order)
+{
+    struct p2m_mem_paging_queue *pmpq;
+
+    /* Return p2mt as is in case of query */
+    if ( q == p2m_query )
+        return;
+    /* Foreign domains can not go to sleep */
+    if ( current->domain != p2m->domain )
+        return;
+
+    pmpq = p2m_mem_paging_get_queue(p2m->domain, gfn);
+    if ( !pmpq )
+        return;
+
+    /* Populate the page once */
+    if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged )
+        p2m_mem_paging_populate(p2m->domain, gfn);
+
+    wait_event(pmpq->wq, p2m_mem_paging_get_entry(mfn, p2m, gfn, t, a, q,
page_order));
+    p2m_mem_paging_put_queue(p2m->domain, pmpq);
+}
+#endif
+
 mfn_t get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn,
                     p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
                     unsigned int *page_order)
@@ -161,6 +338,11 @@ mfn_t get_gfn_type_access(struct p2m_dom
     mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order);
 
 #ifdef __x86_64__
+    if ( unlikely(p2m_is_paging(*t)) )
+        p2m_mem_paging_wait(&mfn, p2m, gfn, t, a, q, page_order);
+#endif
+
+#ifdef __x86_64__
     if ( q == p2m_unshare && p2m_is_shared(*t) )
     {
         ASSERT(!p2m_is_nestedp2m(p2m));
@@ -914,54 +1096,42 @@ void p2m_mem_paging_drop_page(struct dom
 void p2m_mem_paging_populate(struct domain *d, unsigned long gfn)
 {
     struct vcpu *v = current;
-    mem_event_request_t req;
+    mem_event_request_t req = { .type = MEM_EVENT_TYPE_PAGING, .gfn = gfn };
     p2m_type_t p2mt;
     p2m_access_t a;
     mfn_t mfn;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int put_request = 0;
 
     /* Check that there''s space on the ring for this request */
     if ( mem_event_check_ring(d, &d->mem_event->paging) )
         return;
 
-    memset(&req, 0, sizeof(req));
-    req.type = MEM_EVENT_TYPE_PAGING;
-
     /* Fix p2m mapping */
     p2m_lock(p2m);
     mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL);
-    /* Allow only nominated or evicted pages to enter page-in path */
-    if ( p2mt == p2m_ram_paging_out || p2mt == p2m_ram_paged )
-    {
-        /* Evict will fail now, tag this request for pager */
-        if ( p2mt == p2m_ram_paging_out )
-            req.flags |= MEM_EVENT_FLAG_EVICT_FAIL;
+    /* Forward the state only if gfn is in page-out path */
+    if ( p2mt == p2m_ram_paging_out || p2mt == p2m_ram_paged ) {
+        /* Ignore foreign requests to allow mmap in pager */
+        if ( mfn_valid(mfn) && p2mt == p2m_ram_paging_out &&
v->domain == d ) {
+            /* Restore gfn because it is needed by guest before evict */
+            set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K,
+                       paging_mode_log_dirty(d) ? p2m_ram_logdirty :
p2m_ram_rw, a);
+        } else {
+            set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K,
p2m_ram_paging_in_start, a);
+            put_request = 1;
+        }
+        /* Evict will fail now, the pager has to try another gfn */
 
-        set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_paging_in_start,
a);
         audit_p2m(p2m, 1);
     }
     p2m_unlock(p2m);
 
-    /* Pause domain if request came from guest and gfn has paging type */
-    if (  p2m_is_paging(p2mt) && v->domain == d )
-    {
-        vcpu_pause_nosync(v);
-        req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED;
-    }
-    /* No need to inform pager if the gfn is not in the page-out path */
-    else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged )
-    {
-        /* gfn is already on its way back and vcpu is not paused */
+    /* One request per gfn, guest vcpus go to sleep, foreigners try again */
+    if ( put_request )
+        mem_event_put_request(d, &d->mem_event->paging, &req);
+    else
         mem_event_put_req_producers(d, &d->mem_event->paging);
-        return;
-    }
-
-    /* Send request to pager */
-    req.gfn = gfn;
-    req.p2mt = p2mt;
-    req.vcpu_id = v->vcpu_id;
-
-    mem_event_put_request(d, &d->mem_event->paging, &req);
 }
 
 /**
@@ -1062,12 +1232,11 @@ void p2m_mem_paging_resume(struct domain
         p2m_unlock(p2m);
     }
 
-    /* Unpause domain */
-    if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
-        vcpu_unpause(d->vcpu[rsp.vcpu_id]);
-
     /* Wake vcpus waiting for room in the ring */
     mem_event_wake_requesters(&d->mem_event->paging);
+
+    /* Unpause all vcpus that were paused because the gfn was paged */
+    p2m_mem_paging_wake_queue(d, rsp.gfn);
 }
 
 void p2m_mem_access_check(unsigned long gpa, bool_t gla_valid, unsigned long
gla,
diff -r 612f69531fd1 -r 8147822efdee xen/common/domctl.c
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -547,6 +547,9 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc
                 goto maxvcpu_out;
         }
 
+        if ( p2m_mem_paging_init_queue(d, max) )
+            goto maxvcpu_out;
+
         ret = 0;
 
     maxvcpu_out:
diff -r 612f69531fd1 -r 8147822efdee xen/include/asm-x86/hvm/domain.h
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -91,6 +91,9 @@ struct hvm_domain {
 
     struct viridian_domain viridian;
 
+    spinlock_t                        gfn_lock;
+    struct p2m_mem_paging_queue_head *gfn_queue;
+
     bool_t                 hap_enabled;
     bool_t                 mem_sharing_enabled;
     bool_t                 qemu_mapcache_invalidate;
diff -r 612f69531fd1 -r 8147822efdee xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -468,6 +468,8 @@ p2m_pod_offline_or_broken_replace(struct
 /* Modify p2m table for shared gfn */
 int set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
+/* Initialize per-gfn wait queue */
+int p2m_mem_paging_init_queue(struct domain *d, unsigned int max);
 /* Check if a nominated gfn is valid to be paged out */
 int p2m_mem_paging_nominate(struct domain *d, unsigned long gfn);
 /* Evict a frame */

Olaf Hering

2011-Dec-01 11:09 UTC

head link

[PATCH 3 of 4] xenpaging: add need_populate and paged_no_mfn checks

# HG changeset patch
# User Olaf Hering <olaf@aepfle.de>
# Date 1322737586 -3600
# Node ID c09ac3717a025a8ead44bbc795fedda715d134c7
# Parent  8147822efdee65d1f5b94656ab2032aedb76979f
xenpaging: add need_populate and paged_no_mfn checks

There is currently a mix of p2mt checks for the various paging types.
Some mean the p2mt needs to be populated, others mean a gfn without mfn.

Add a new p2m_do_populate() helper which covers the p2m_ram_paged and
p2m_ram_paging_out types. If a gfn is not in these states anymore another
populate request for the pager is not needed. This avoids a call to
p2m_mem_paging_populate() which in turn reduces the pressure on the ring
buffer because no temporary slot needs to be claimed. As such, this helper is
an optimization.

Modify the existing p2m_is_paged() helper which now covers also
p2m_ram_paging_in_start in addition to the current p2m_ram_paged type.  A gfn
in these two states is not backed by a mfn.

Signed-off-by: Olaf Hering <olaf@aepfle.de>

diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/hvm/emulate.c
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -66,7 +66,8 @@ static int hvmemul_do_io(
     ram_mfn = get_gfn_unshare(curr->domain, ram_gfn, &p2mt);
     if ( p2m_is_paging(p2mt) )
     {
-        p2m_mem_paging_populate(curr->domain, ram_gfn);
+        if ( p2m_do_populate(p2mt) )
+            p2m_mem_paging_populate(curr->domain, ram_gfn);
         put_gfn(curr->domain, ram_gfn); 
         return X86EMUL_RETRY;
     }
diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -363,7 +363,8 @@ static int hvm_set_ioreq_page(
     }
     if ( p2m_is_paging(p2mt) )
     {
-        p2m_mem_paging_populate(d, gmfn);
+        if ( p2m_do_populate(p2mt) )
+            p2m_mem_paging_populate(d, gmfn);
         put_gfn(d, gmfn);
         return -ENOENT;
     }
@@ -1300,7 +1301,7 @@ int hvm_hap_nested_page_fault(unsigned l
 
 #ifdef __x86_64__
     /* Check if the page has been paged out */
-    if ( p2m_is_paged(p2mt) || (p2mt == p2m_ram_paging_out) )
+    if ( p2m_do_populate(p2mt) )
         p2m_mem_paging_populate(v->domain, gfn);
 
     /* Mem sharing: unshare the page and try again */
@@ -1820,7 +1821,8 @@ static void *__hvm_map_guest_frame(unsig
     }
     if ( p2m_is_paging(p2mt) )
     {
-        p2m_mem_paging_populate(d, gfn);
+        if ( p2m_do_populate(p2mt) )
+            p2m_mem_paging_populate(d, gfn);
         put_gfn(d, gfn);
         return NULL;
     }
@@ -2280,7 +2282,8 @@ static enum hvm_copy_result __hvm_copy(
 
         if ( p2m_is_paging(p2mt) )
         {
-            p2m_mem_paging_populate(curr->domain, gfn);
+            if ( p2m_do_populate(p2mt) )
+                p2m_mem_paging_populate(curr->domain, gfn);
             put_gfn(curr->domain, gfn);
             return HVMCOPY_gfn_paged_out;
         }
@@ -3760,7 +3763,8 @@ long do_hvm_op(unsigned long op, XEN_GUE
             mfn_t mfn = get_gfn_unshare(d, pfn, &t);
             if ( p2m_is_paging(t) )
             {
-                p2m_mem_paging_populate(d, pfn);
+                if ( p2m_do_populate(t) )
+                    p2m_mem_paging_populate(d, pfn);
                 put_gfn(d, pfn);
                 rc = -EINVAL;
                 goto param_fail3;
@@ -3864,7 +3868,8 @@ long do_hvm_op(unsigned long op, XEN_GUE
             mfn = get_gfn_unshare(d, pfn, &t);
             if ( p2m_is_paging(t) )
             {
-                p2m_mem_paging_populate(d, pfn);
+                if ( p2m_do_populate(t) )
+                    p2m_mem_paging_populate(d, pfn);
                 put_gfn(d, pfn);
                 rc = -EINVAL;
                 goto param_fail4;
diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3525,9 +3525,10 @@ int do_mmu_update(
             if ( !p2m_is_valid(p2mt) )
                 mfn = INVALID_MFN;
 
-            if ( p2m_is_paged(p2mt) )
+            if ( p2m_is_paged(p2mt) && !mfn_valid(mfn) )
             {
-                p2m_mem_paging_populate(pg_owner, gmfn);
+                if ( p2m_do_populate(p2mt) )
+                    p2m_mem_paging_populate(pg_owner, gmfn);
                 put_gfn(pt_owner, gmfn);
                 rc = -ENOENT;
                 break;
@@ -3565,15 +3566,10 @@ int do_mmu_update(
     
                     l1emfn = mfn_x(get_gfn(pg_owner, l1egfn, &l1e_p2mt));
 
-                    if ( p2m_is_paged(l1e_p2mt) )
+                    if ( p2m_is_paged(l1e_p2mt) && !mfn_valid(l1emfn) )
                     {
-                        p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
-                        put_gfn(pg_owner, l1egfn);
-                        rc = -ENOENT;
-                        break;
-                    }
-                    else if ( p2m_ram_paging_in_start == l1e_p2mt &&
!mfn_valid(mfn) )
-                    {
+                        if ( p2m_do_populate(l1e_p2mt) )
+                            p2m_mem_paging_populate(pg_owner,
l1e_get_pfn(l1e));
                         put_gfn(pg_owner, l1egfn);
                         rc = -ENOENT;
                         break;
@@ -3613,15 +3609,10 @@ int do_mmu_update(
 
                     l2emfn = mfn_x(get_gfn(pg_owner, l2egfn, &l2e_p2mt));
 
-                    if ( p2m_is_paged(l2e_p2mt) )
+                    if ( p2m_is_paged(l2e_p2mt) && !mfn_valid(l2emfn) )
                     {
-                        p2m_mem_paging_populate(pg_owner, l2egfn);
-                        put_gfn(pg_owner, l2egfn);
-                        rc = -ENOENT;
-                        break;
-                    }
-                    else if ( p2m_ram_paging_in_start == l2e_p2mt &&
!mfn_valid(mfn) )
-                    {
+                        if ( p2m_do_populate(l2e_p2mt) )
+                            p2m_mem_paging_populate(pg_owner, l2egfn);
                         put_gfn(pg_owner, l2egfn);
                         rc = -ENOENT;
                         break;
@@ -3647,15 +3638,10 @@ int do_mmu_update(
 
                     l3emfn = mfn_x(get_gfn(pg_owner, l3egfn, &l3e_p2mt));
 
-                    if ( p2m_is_paged(l3e_p2mt) )
+                    if ( p2m_is_paged(l3e_p2mt) && !mfn_valid(l3emfn) )
                     {
-                        p2m_mem_paging_populate(pg_owner, l3egfn);
-                        put_gfn(pg_owner, l3egfn);
-                        rc = -ENOENT;
-                        break;
-                    }
-                    else if ( p2m_ram_paging_in_start == l3e_p2mt &&
!mfn_valid(mfn) )
-                    {
+                        if ( p2m_do_populate(l3e_p2mt) )
+                            p2m_mem_paging_populate(pg_owner, l3egfn);
                         put_gfn(pg_owner, l3egfn);
                         rc = -ENOENT;
                         break;
@@ -3681,15 +3667,10 @@ int do_mmu_update(
 
                     l4emfn = mfn_x(get_gfn(pg_owner, l4egfn, &l4e_p2mt));
 
-                    if ( p2m_is_paged(l4e_p2mt) )
+                    if ( p2m_is_paged(l4e_p2mt) && !mfn_valid(l4emfn) )
                     {
-                        p2m_mem_paging_populate(pg_owner, l4egfn);
-                        put_gfn(pg_owner, l4egfn);
-                        rc = -ENOENT;
-                        break;
-                    }
-                    else if ( p2m_ram_paging_in_start == l4e_p2mt &&
!mfn_valid(mfn) )
-                    {
+                        if ( p2m_do_populate(l4e_p2mt) )
+                            p2m_mem_paging_populate(pg_owner, l4egfn);
                         put_gfn(pg_owner, l4egfn);
                         rc = -ENOENT;
                         break;
diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/mm/guest_walk.c
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -102,7 +102,8 @@ static inline void *map_domain_gfn(struc
     if ( p2m_is_paging(*p2mt) )
     {
         ASSERT(!p2m_is_nestedp2m(p2m));
-        p2m_mem_paging_populate(p2m->domain, gfn_x(gfn));
+        if ( p2m_do_populate(*p2mt) )
+            p2m_mem_paging_populate(p2m->domain, gfn_x(gfn));
         __put_gfn(p2m, gfn_x(gfn));
         *rc = _PAGE_PAGED;
         return NULL;
diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/mm/hap/guest_walk.c
--- a/xen/arch/x86/mm/hap/guest_walk.c
+++ b/xen/arch/x86/mm/hap/guest_walk.c
@@ -64,7 +64,8 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA
     if ( p2m_is_paging(p2mt) )
     {
         ASSERT(!p2m_is_nestedp2m(p2m));
-        p2m_mem_paging_populate(p2m->domain, cr3 >> PAGE_SHIFT);
+        if ( p2m_do_populate(p2mt) )
+            p2m_mem_paging_populate(p2m->domain, cr3 >> PAGE_SHIFT);
 
         pfec[0] = PFEC_page_paged;
         __put_gfn(p2m, top_gfn);
@@ -101,7 +102,8 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA
         if ( p2m_is_paging(p2mt) )
         {
             ASSERT(!p2m_is_nestedp2m(p2m));
-            p2m_mem_paging_populate(p2m->domain, gfn_x(gfn));
+            if ( p2m_do_populate(p2mt) )
+                p2m_mem_paging_populate(p2m->domain, gfn_x(gfn));
 
             pfec[0] = PFEC_page_paged;
             __put_gfn(p2m, gfn_x(gfn));
diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/mm/p2m-ept.c
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -375,8 +375,7 @@ ept_set_entry(struct p2m_domain *p2m, un
          * Read-then-write is OK because we hold the p2m lock. */
         old_entry = *ept_entry;
 
-        if ( mfn_valid(mfn_x(mfn)) || direct_mmio || p2m_is_paged(p2mt) ||
-             (p2mt == p2m_ram_paging_in_start) )
+        if ( mfn_valid(mfn_x(mfn)) || direct_mmio || p2m_is_paged(p2mt) )
         {
             /* Construct the new entry, and then write it once */
             new_entry.emt = epte_get_entry_emt(p2m->domain, gfn, mfn,
&ipat,
diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -313,7 +313,7 @@ static void p2m_mem_paging_wait(mfn_t *m
         return;
 
     /* Populate the page once */
-    if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged )
+    if ( p2m_do_populate(*t) )
         p2m_mem_paging_populate(p2m->domain, gfn);
 
     wait_event(pmpq->wq, p2m_mem_paging_get_entry(mfn, p2m, gfn, t, a, q,
page_order));
@@ -1111,7 +1111,7 @@ void p2m_mem_paging_populate(struct doma
     p2m_lock(p2m);
     mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL);
     /* Forward the state only if gfn is in page-out path */
-    if ( p2mt == p2m_ram_paging_out || p2mt == p2m_ram_paged ) {
+    if ( p2m_do_populate(p2mt) ) {
         /* Ignore foreign requests to allow mmap in pager */
         if ( mfn_valid(mfn) && p2mt == p2m_ram_paging_out &&
v->domain == d ) {
             /* Restore gfn because it is needed by guest before evict */
diff -r 8147822efdee -r c09ac3717a02 xen/common/grant_table.c
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -163,7 +163,8 @@ static int __get_paged_frame(unsigned lo
         *frame = mfn_x(mfn);
         if ( p2m_is_paging(p2mt) )
         {
-            p2m_mem_paging_populate(rd, gfn);
+            if ( p2m_do_populate(p2mt) )
+                p2m_mem_paging_populate(rd, gfn);
             put_gfn(rd, gfn);
             rc = GNTST_eagain;
         }
diff -r 8147822efdee -r c09ac3717a02 xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -158,7 +158,11 @@ typedef enum {
                           | p2m_to_mask(p2m_ram_paging_in_start) \
                           | p2m_to_mask(p2m_ram_paging_in))
 
-#define P2M_PAGED_TYPES (p2m_to_mask(p2m_ram_paged))
+#define P2M_POPULATE_TYPES (p2m_to_mask(p2m_ram_paged) \
+                            | p2m_to_mask(p2m_ram_paging_out) )
+
+#define P2M_PAGED_NO_MFN_TYPES (p2m_to_mask(p2m_ram_paged) \
+                               | p2m_to_mask(p2m_ram_paging_in_start) )
 
 /* Shared types */
 /* XXX: Sharable types could include p2m_ram_ro too, but we would need to
@@ -183,7 +187,8 @@ typedef enum {
 #define p2m_has_emt(_t)  (p2m_to_mask(_t) & (P2M_RAM_TYPES |
p2m_to_mask(p2m_mmio_direct)))
 #define p2m_is_pageable(_t) (p2m_to_mask(_t) & P2M_PAGEABLE_TYPES)
 #define p2m_is_paging(_t)   (p2m_to_mask(_t) & P2M_PAGING_TYPES)
-#define p2m_is_paged(_t)    (p2m_to_mask(_t) & P2M_PAGED_TYPES)
+#define p2m_do_populate(_t) (p2m_to_mask(_t) & P2M_POPULATE_TYPES)
+#define p2m_is_paged(_t)    (p2m_to_mask(_t) & P2M_PAGED_NO_MFN_TYPES)
 #define p2m_is_sharable(_t) (p2m_to_mask(_t) & P2M_SHARABLE_TYPES)
 #define p2m_is_shared(_t)   (p2m_to_mask(_t) & P2M_SHARED_TYPES)
 #define p2m_is_broken(_t)   (p2m_to_mask(_t) & P2M_BROKEN_TYPES)

Olaf Hering

2011-Dec-01 11:09 UTC

head link

[PATCH 4 of 4] Add tracing to mem_event and p2m_mem_paging functions

# HG changeset patch
# User Olaf Hering <olaf@aepfle.de>
# Date 1322737677 -3600
# Node ID 0bf827a48a3c9efd43fe04c291b553d2bec46f8b
# Parent  c09ac3717a025a8ead44bbc795fedda715d134c7
Add tracing to mem_event and p2m_mem_paging functions

Maintaining these trace_var calls out-of-tree became a burden and made
debugging paging related issues harder. So putting it into the tree will
ease debugging with xenalyze.

Signed-off-by: Olaf Hering <olaf@aepfle.de>

diff -r c09ac3717a02 -r 0bf827a48a3c xen/arch/x86/mm/mem_event.c
--- a/xen/arch/x86/mm/mem_event.c
+++ b/xen/arch/x86/mm/mem_event.c
@@ -24,6 +24,7 @@
 #include <asm/domain.h>
 #include <xen/event.h>
 #include <xen/wait.h>
+#include <xen/trace.h>
 #include <asm/p2m.h>
 #include <asm/mem_event.h>
 #include <asm/mem_paging.h>
@@ -149,6 +150,13 @@ static int _mem_event_put_request(struct
     mem_event_front_ring_t *front_ring;
     int free_requests;
     RING_IDX req_prod;
+    struct trc_mem_event_put_request T = { .td = d->domain_id };
+    int ret = 0;
+
+    T.d = current->domain->domain_id;
+    T.v = current->vcpu_id;
+    T.ring_bit = med->bit;
+    T.gfn = req->gfn;
 
     mem_event_ring_lock(med);
 
@@ -156,7 +164,7 @@ static int _mem_event_put_request(struct
     /* Requests from foreign domain were claimed in mem_event_check_ring() */
     if ((current->domain == d && free_requests <
med->foreign_producers) || !free_requests) {
         mem_event_ring_unlock(med);
-        return 0;
+        goto out;
     }
 
     front_ring = &med->front_ring;
@@ -176,7 +184,15 @@ static int _mem_event_put_request(struct
 
     notify_via_xen_event_channel(d, med->xen_port);
 
-    return 1;
+    ret = 1;
+
+out:
+    T.ret = ret;
+    T.room = free_requests;
+    T.foreign_producers = med->foreign_producers + ret;
+    trace_var(TRC_MEM_EVENT_PUT_REQUEST, 1, sizeof(T), &T);
+
+    return ret;
 }
 
 void mem_event_put_request(struct domain *d, struct mem_event_domain *med,
diff -r c09ac3717a02 -r 0bf827a48a3c xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -31,6 +31,7 @@
 #include <asm/hvm/vmx/vmx.h> /* ept_p2m_init() */
 #include <xen/iommu.h>
 #include <xen/wait.h>
+#include <xen/trace.h>
 #include <asm/mem_event.h>
 #include <public/mem_event.h>
 #include <asm/mem_sharing.h>
@@ -215,7 +216,12 @@ static struct p2m_mem_paging_queue *p2m_
 {
     struct p2m_mem_paging_queue_head *h;
     struct p2m_mem_paging_queue *q, *q_match, *q_free;
+    struct trc_mem_p2m_paging_queue T = { .td = d->domain_id };
     
+    T.d = current->domain->domain_id;
+    T.v = current->vcpu_id;
+    T.gfn = gfn;
+
     h = d->arch.hvm_domain.gfn_queue;
     q_match = q_free = NULL;
 
@@ -238,19 +244,37 @@ static struct p2m_mem_paging_queue *p2m_
             printk("wq woken for gfn %u:%u %lx %u %u %u\n",
current->domain->domain_id, current->vcpu_id, gfn, q_match->index,
q_match->woken, q_match->waiters);
         q_match->waiters++;
         q_match->gfn = gfn;
+
+        T.waiters = q_match->waiters;
+        T.woken = q_match->woken;
+        T.index = q_match->index;
     }
 
+    trace_var(TRC_MEM_P2M_PAGING_GET_QUEUE, 1, sizeof(T), &T);
+
     if (!q_match)
         printk("No wq_get for gfn %u:%u %lx\n",
current->domain->domain_id, current->vcpu_id, gfn);
 
+
     spin_unlock(&d->arch.hvm_domain.gfn_lock);
     return q_match;
 }
 
 static void p2m_mem_paging_put_queue(struct domain *d, struct
p2m_mem_paging_queue *q_match)
 {
+    struct trc_mem_p2m_paging_queue T = { .td = d->domain_id };
+
+    T.d = current->domain->domain_id;
+    T.v = current->vcpu_id;
+    T.gfn = q_match->gfn;
+    T.waiters = q_match->waiters;
+    T.woken = q_match->woken;
+    T.index = q_match->index;
+
     spin_lock(&d->arch.hvm_domain.gfn_lock);
 
+    trace_var(TRC_MEM_P2M_PAGING_PUT_QUEUE, 1, sizeof(T), &T);
+
     if (q_match->waiters == 0)
         printk("wq_put no waiters, gfn %u:%u %lx %u\n",
current->domain->domain_id, current->vcpu_id, q_match->gfn,
q_match->woken);
     else if (--q_match->waiters == 0)
@@ -263,6 +287,11 @@ static void p2m_mem_paging_wake_queue(st
 {
     struct p2m_mem_paging_queue_head *h;
     struct p2m_mem_paging_queue *q, *q_match = NULL;
+    struct trc_mem_p2m_paging_queue T = { .td = d->domain_id };
+
+    T.d = current->domain->domain_id;
+    T.v = current->vcpu_id;
+    T.gfn = gfn;
 
     spin_lock(&d->arch.hvm_domain.gfn_lock);
 
@@ -277,8 +306,15 @@ static void p2m_mem_paging_wake_queue(st
         if (q_match->woken || q_match->waiters == 0)
             printk("Wrong wake for gfn %u:%u %p %lx %u %u\n",
current->domain->domain_id, current->vcpu_id, q_match, gfn,
q_match->woken, q_match->waiters);
         q_match->woken++;
+
+        T.waiters = q_match->waiters;
+        T.woken = q_match->woken;
+        T.index = q_match->index;
+
         wake_up_all(&q_match->wq);
     }
+    trace_var(TRC_MEM_P2M_PAGING_WAKE_QUEUE, 1, sizeof(T), &T);
+
     spin_unlock(&d->arch.hvm_domain.gfn_lock);
 }
 
@@ -937,34 +973,45 @@ int p2m_mem_paging_nominate(struct domai
     p2m_access_t a;
     mfn_t mfn;
     int ret;
+    struct trc_mem_p2m_paging T = { .td = d->domain_id };
+
+    T.d = current->domain->domain_id;
+    T.v = current->vcpu_id;
+    T.gfn = gfn;
 
     p2m_lock(p2m);
 
     mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL);
 
+    T.reason++;
     /* Check if mfn is valid */
     ret = -EINVAL;
     if ( !mfn_valid(mfn) )
         goto out;
 
+    T.reason++;
     /* Check p2m type */
     ret = -EAGAIN;
     if ( !p2m_is_pageable(p2mt) )
         goto out;
 
+    T.reason++;
     /* Check for io memory page */
     if ( is_iomem_page(mfn_x(mfn)) )
         goto out;
 
+    T.reason++;
     /* Check page count and type */
     page = mfn_to_page(mfn);
     if ( (page->count_info & (PGC_count_mask | PGC_allocated)) !        
(1 | PGC_allocated) )
         goto out;
 
+    T.reason++;
     if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_none )
         goto out;
 
+    T.reason++;
     /* Fix p2m entry */
     set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_paging_out, a);
     audit_p2m(p2m, 1);
@@ -972,6 +1019,10 @@ int p2m_mem_paging_nominate(struct domai
 
  out:
     p2m_unlock(p2m);
+
+    T.mfn = mfn_x(mfn);
+    T.p2mt = p2mt;
+    trace_var(TRC_MEM_P2M_PAGING_NOMINATE, 1, sizeof(T), &T);
     return ret;
 }
 
@@ -1002,6 +1053,11 @@ int p2m_mem_paging_evict(struct domain *
     mfn_t mfn;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
     int ret = -EINVAL;
+    struct trc_mem_p2m_paging T = { .td = d->domain_id };
+
+    T.d = current->domain->domain_id;
+    T.v = current->vcpu_id;
+    T.gfn = gfn;
 
     p2m_lock(p2m);
 
@@ -1010,24 +1066,29 @@ int p2m_mem_paging_evict(struct domain *
     if ( unlikely(!mfn_valid(mfn)) )
         goto out;
 
+    T.reason++;
     /* Allow only nominated pages */
     if ( p2mt != p2m_ram_paging_out )
         goto out;
 
+    T.reason++;
     ret = -EBUSY;
     /* Get the page so it doesn''t get modified under Xen''s
feet */
     page = mfn_to_page(mfn);
     if ( unlikely(!get_page(page, d)) )
         goto out;
 
+    T.reason++;
     /* Check page count and type once more */
     if ( (page->count_info & (PGC_count_mask | PGC_allocated)) !        
(2 | PGC_allocated) )
         goto out_put;
 
+    T.reason++;
     if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_none )
         goto out_put;
 
+    T.reason++;
     /* Decrement guest domain''s ref count of the page */
     if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
         put_page(page);
@@ -1050,6 +1111,12 @@ int p2m_mem_paging_evict(struct domain *
 
  out:
     p2m_unlock(p2m);
+
+    T.flag_evict_fail = !!ret;
+    T.mfn = mfn_x(mfn);
+    T.p2mt = p2mt;
+    trace_var(TRC_MEM_P2M_PAGING_EVICT, 1, sizeof(T), &T);
+
     return ret;
 }
 
@@ -1065,10 +1132,16 @@ int p2m_mem_paging_evict(struct domain *
 void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn)
 {
     mem_event_request_t req = { .type = MEM_EVENT_TYPE_PAGING, .gfn = gfn };
+    struct trc_mem_p2m_paging T = { .td = d->domain_id };
 
     /* Send release notification to pager */
     req.flags = MEM_EVENT_FLAG_DROP_PAGE;
 
+    T.d = current->domain->domain_id;
+    T.v = current->vcpu_id;
+    T.gfn = gfn;
+    trace_var(TRC_MEM_P2M_PAGING_DROP, 1, sizeof(T), &T);
+
     mem_event_put_request(d, &d->mem_event->paging, &req);
 }
 
@@ -1101,11 +1174,13 @@ void p2m_mem_paging_populate(struct doma
     p2m_access_t a;
     mfn_t mfn;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    int put_request = 0;
+    int put_request = 0, ring_full;
+    struct trc_mem_p2m_paging T = { .td = d->domain_id };
 
     /* Check that there''s space on the ring for this request */
-    if ( mem_event_check_ring(d, &d->mem_event->paging) )
-        return;
+    ring_full = mem_event_check_ring(d, &d->mem_event->paging);
+    if ( ring_full )
+        goto trace;
 
     /* Fix p2m mapping */
     p2m_lock(p2m);
@@ -1127,11 +1202,24 @@ void p2m_mem_paging_populate(struct doma
     }
     p2m_unlock(p2m);
 
-    /* One request per gfn, guest vcpus go to sleep, foreigners try again */
-    if ( put_request )
-        mem_event_put_request(d, &d->mem_event->paging, &req);
-    else
-        mem_event_put_req_producers(d, &d->mem_event->paging);
+    T.mfn = mfn_x(mfn);
+    T.p2mt = p2mt;
+
+trace:
+    T.d = current->domain->domain_id;
+    T.v = current->vcpu_id;
+    T.gfn = gfn;
+    T.reason = ring_full;
+    T.flag_drop_page = put_request;
+    trace_var(TRC_MEM_P2M_PAGING_POPULATE, 1, sizeof(T), &T);
+
+    if ( !ring_full ) {
+        /* One request per gfn, guest vcpus go to sleep, foreigners try again
*/
+        if ( put_request )
+            mem_event_put_request(d, &d->mem_event->paging,
&req);
+        else
+            mem_event_put_req_producers(d, &d->mem_event->paging);
+    }
 }
 
 /**
@@ -1153,6 +1241,7 @@ int p2m_mem_paging_prep(struct domain *d
     mfn_t mfn;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
     int ret;
+    struct trc_mem_p2m_paging T = { .td = d->domain_id };
 
     p2m_lock(p2m);
 
@@ -1184,6 +1273,15 @@ int p2m_mem_paging_prep(struct domain *d
 
  out:
     p2m_unlock(p2m);
+
+    T.d = current->domain->domain_id;
+    T.v = current->vcpu_id;
+    T.gfn = gfn;
+    T.mfn = mfn_x(mfn);
+    T.p2mt = p2mt;
+    T.reason = ret;
+    trace_var(TRC_MEM_P2M_PAGING_PREP, 1, sizeof(T), &T);
+
     return ret;
 }
 
@@ -1209,6 +1307,7 @@ void p2m_mem_paging_resume(struct domain
     p2m_type_t p2mt;
     p2m_access_t a;
     mfn_t mfn;
+    struct trc_mem_p2m_paging T = { .td = d->domain_id };
 
     /* Pull the response off the ring */
     mem_event_get_response(&d->mem_event->paging, &rsp);
@@ -1230,8 +1329,16 @@ void p2m_mem_paging_resume(struct domain
             audit_p2m(p2m, 1);
         }
         p2m_unlock(p2m);
+        T.mfn = mfn_x(mfn);
+        T.p2mt = p2mt;
     }
 
+    T.d = current->domain->domain_id;
+    T.v = current->vcpu_id;
+    T.gfn = rsp.gfn;
+    T.flag_drop_page = !!(rsp.flags & MEM_EVENT_FLAG_DROP_PAGE);
+    trace_var(TRC_MEM_P2M_PAGING_RESUME, 1, sizeof(T), &T);
+
     /* Wake vcpus waiting for room in the ring */
     mem_event_wake_requesters(&d->mem_event->paging);
 
diff -r c09ac3717a02 -r 0bf827a48a3c xen/include/public/trace.h
--- a/xen/include/public/trace.h
+++ b/xen/include/public/trace.h
@@ -94,6 +94,37 @@
 #define TRC_MEM_POD_ZERO_RECLAIM    (TRC_MEM + 17)
 #define TRC_MEM_POD_SUPERPAGE_SPLINTER (TRC_MEM + 18)
 
+#define TRC_MEM_EVENT_PUT_REQUEST      (TRC_MEM + 19)
+struct trc_mem_event_put_request {
+    unsigned int d:16, v:16;
+    unsigned int td:16, room:16;
+    unsigned int gfn;
+    unsigned int foreign_producers:16, ret:1, ring_bit:5;
+} __attribute__((packed));
+
+#define TRC_MEM_P2M_PAGING_NOMINATE    (TRC_MEM + 20)
+#define TRC_MEM_P2M_PAGING_EVICT       (TRC_MEM + 21)
+#define TRC_MEM_P2M_PAGING_DROP        (TRC_MEM + 22)
+#define TRC_MEM_P2M_PAGING_POPULATE    (TRC_MEM + 23)
+#define TRC_MEM_P2M_PAGING_PREP        (TRC_MEM + 24)
+#define TRC_MEM_P2M_PAGING_RESUME      (TRC_MEM + 25)
+struct trc_mem_p2m_paging {
+    unsigned int d:16, v:16;
+    unsigned int td:16, p2mt:5, reason:5, flag_evict_fail:1, flag_drop_page:1;
+    unsigned int gfn;
+    unsigned int mfn;
+} __attribute__((packed));
+
+#define TRC_MEM_P2M_PAGING_GET_QUEUE   (TRC_MEM + 26)
+#define TRC_MEM_P2M_PAGING_PUT_QUEUE   (TRC_MEM + 27)
+#define TRC_MEM_P2M_PAGING_WAKE_QUEUE  (TRC_MEM + 28)
+struct trc_mem_p2m_paging_queue {
+    unsigned int d:16, v:16;
+    unsigned int td:16, woken:8;
+    unsigned int gfn;
+    unsigned int index:16, waiters:16;
+} __attribute__((packed));
+
 
 #define TRC_PV_HYPERCALL             (TRC_PV +  1)
 #define TRC_PV_TRAP                  (TRC_PV +  3)

Tim Deegan

2011-Dec-01 14:01 UTC

head link

Re: [PATCH 0 of 4] RFC: wait queue usage

At 12:09 +0100 on 01 Dec (1322741356), Olaf Hering
wrote:> 
> The following series is my current state of an attempt to use wait queues
for
> xenpaging and mem_event handling. I post it here for review and comments.
> 
> This series conflicts with work from Andres Lagar-Cavilla.
Can you be clear about which of Andres''s patchsets you are objecting
to?
I have five series (!) of his on my todo list right now. 

Tim.

Olaf Hering

2011-Dec-01 14:19 UTC

head link

Re: [PATCH 0 of 4] RFC: wait queue usage

On Thu, Dec 01, Tim Deegan wrote:
> At 12:09 +0100 on 01 Dec (1322741356), Olaf Hering wrote:
> > 
> > The following series is my current state of an attempt to use wait
queues for
> > xenpaging and mem_event handling. I post it here for review and
comments.
> > 
> > This series conflicts with work from Andres Lagar-Cavilla.
> 
> Can you be clear about which of Andres''s patchsets you are
objecting to?
> I have five series (!) of his on my todo list right now. 
The "Improve ring management for memory events" change which tries to
achieve the same result as my "mem_event: use wait queue when ring is
full" change.
There is already a discussion about which way to choose.

Olaf

Tim Deegan

2011-Dec-15 12:28 UTC

head link

Re: [PATCH 2 of 4] xenpaging: use wait queues

Hi, 

Is there a more recent version of this patch?  I see a newer version of
the wait-queues-for-mem-events one but not of this.

At 12:09 +0100 on 01 Dec (1322741358), Olaf Hering
wrote:> # HG changeset patch
> # User Olaf Hering <olaf@aepfle.de>
> # Date 1322737507 -3600
> # Node ID 8147822efdee65d1f5b94656ab2032aedb76979f
> # Parent  612f69531fd15cf59c58404f6e4762733a9a268c
> xenpaging: use wait queues
> 
> Use a wait queue to put a guest vcpu to sleep while the requested gfn is
> in paging state. This adds missing p2m_mem_paging_populate() calls to
> some callers of the new get_gfn* variants, which would crash now
> because they get an invalid mfn. It also fixes guest crashes due to
> unexpected returns from do_memory_op because copy_to/from_guest ran into
> a paged gfn. Now those places will always get a valid mfn.
> 
> Since each gfn could be requested by several guest vcpus at the same
> time a queue of paged gfns is maintained. Each vcpu will be attached to
> that queue. Once p2m_mem_paging_resume restored the gfn the waiting
> vcpus will resume execution.
> 
> There is untested code in p2m_mem_paging_init_queue() to allow cpu
> hotplug. Since each vcpu may wait on a different gfn there have to be as
> many queues as vcpus. But xl vcpu-set does not seem to work right now,
> so this code path cant be excercised right now.
> 
> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> 
> diff -r 612f69531fd1 -r 8147822efdee xen/arch/x86/hvm/hvm.c
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -454,6 +454,8 @@ int hvm_domain_initialise(struct domain 
>      spin_lock_init(&d->arch.hvm_domain.irq_lock);
>      spin_lock_init(&d->arch.hvm_domain.uc_lock);
>  
> +    spin_lock_init(&d->arch.hvm_domain.gfn_lock);
> +
I agree with Andres''s comments on this lock, and will go futher: please
 - add it to the mm-locks.h system so we see any ordering vioations
   in future patches. 
 - add a detailed comment there explaining what exactly this lock protects.

Cheers,

Tim.

Olaf Hering

2011-Dec-15 12:40 UTC

head link

Re: [PATCH 2 of 4] xenpaging: use wait queues

On Thu, Dec 15, Tim Deegan wrote:
> Is there a more recent version of this patch?  I see a newer version of
> the wait-queues-for-mem-events one but not of this.
The version I sent is not final, pv guests wont start for example.
I will adress the comments from Andres and you and send out a better
version.

Olaf

Xen devel - Dec 2011 - [PATCH 0 of 4] RFC: wait queue usage

[PATCH 0 of 4] RFC: wait queue usage

[PATCH 1 of 4] mem_event: use wait queue when ring is full

[PATCH 2 of 4] xenpaging: use wait queues

[PATCH 3 of 4] xenpaging: add need_populate and paged_no_mfn checks

[PATCH 4 of 4] Add tracing to mem_event and p2m_mem_paging functions

Re: [PATCH 0 of 4] RFC: wait queue usage

Re: [PATCH 0 of 4] RFC: wait queue usage

Re: [PATCH 2 of 4] xenpaging: use wait queues

Re: [PATCH 2 of 4] xenpaging: use wait queues