The following series is my current state of an attempt to use wait queues for xenpaging and mem_event handling. I post it here for review and comments. This series conflicts with work from Andres Lagar-Cavilla. Olaf xen/arch/x86/hvm/emulate.c | 3 xen/arch/x86/hvm/hvm.c | 21 +- xen/arch/x86/mm.c | 49 +---- xen/arch/x86/mm/guest_walk.c | 3 xen/arch/x86/mm/hap/guest_walk.c | 6 xen/arch/x86/mm/mem_event.c | 150 +++++++++++++-- xen/arch/x86/mm/mem_sharing.c | 27 +- xen/arch/x86/mm/p2m-ept.c | 3 xen/arch/x86/mm/p2m.c | 378 +++++++++++++++++++++++++++++++++------ xen/common/domctl.c | 3 xen/common/grant_table.c | 3 xen/include/asm-x86/hvm/domain.h | 3 xen/include/asm-x86/mem_event.h | 8 xen/include/asm-x86/p2m.h | 11 - xen/include/public/trace.h | 31 +++ xen/include/xen/sched.h | 16 + 16 files changed, 569 insertions(+), 146 deletions(-)
Olaf Hering
2011-Dec-01 11:09 UTC
[PATCH 1 of 4] mem_event: use wait queue when ring is full
# HG changeset patch # User Olaf Hering <olaf@aepfle.de> # Date 1322736732 -3600 # Node ID 612f69531fd15cf59c58404f6e4762733a9a268c # Parent 3c4c29899d8a2a0c0f9109b7236da95bb72b77b6 mem_event: use wait queue when ring is full This change is based on an idea/patch from Adin Scannell. If the ring is full, put the current vcpu to sleep if it belongs to the target domain. The wakeup happens in the p2m_*_resume functions. Wakeup will take the number of free slots into account. A request from another domain will use the existing ->req_producers functionality because sleeping is not possible if the vcpu does not belong to the target domain. This change fixes also a bug in p2m_mem_paging_drop_page(). Up to now a full ring will lead to harmless inconsistency in the pager. v4: - fix off-by-one bug in _mem_event_put_request - add mem_event_wake_requesters() and use wake_up_nr() - rename mem_event_mark_and_pause() and mem_event_mark_and_pause() functions - req_producers counts foreign request producers, rename member v3: - rename ->mem_event_bit to ->bit - remove me_ from new VPF_ defines v2: - p2m_mem_paging_populate: move vcpu_pause after put_request, otherwise the vcpu will not wake_up after a wait_event because the pause_count was increased twice. Fixes guest hangs. - update free space check in _mem_event_put_request() - simplify mem_event_put_request() Signed-off-by: Olaf Hering <olaf@aepfle.de> diff -r 3c4c29899d8a -r 612f69531fd1 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4107,7 +4107,7 @@ static int hvm_memory_event_traps(long p return 1; rc = mem_event_check_ring(d, &d->mem_event->access); - if ( rc ) + if ( rc < 0 ) return rc; memset(&req, 0, sizeof(req)); diff -r 3c4c29899d8a -r 612f69531fd1 xen/arch/x86/mm/mem_event.c --- a/xen/arch/x86/mm/mem_event.c +++ b/xen/arch/x86/mm/mem_event.c @@ -23,6 +23,7 @@ #include <asm/domain.h> #include <xen/event.h> +#include <xen/wait.h> #include <asm/p2m.h> #include <asm/mem_event.h> #include <asm/mem_paging.h> @@ -39,6 +40,7 @@ static int mem_event_enable(struct domain *d, xen_domctl_mem_event_op_t *mec, + int bit, struct mem_event_domain *med) { int rc; @@ -107,8 +109,12 @@ static int mem_event_enable(struct domai mem_event_ring_lock_init(med); + med->bit = bit; + + init_waitqueue_head(&med->wq); + /* Wake any VCPUs paused for memory events */ - mem_event_unpause_vcpus(d); + mem_event_wake_waiters(d, med); return 0; @@ -124,6 +130,9 @@ static int mem_event_enable(struct domai static int mem_event_disable(struct mem_event_domain *med) { + if (!list_empty(&med->wq.list)) + return -EBUSY; + unmap_domain_page(med->ring_page); med->ring_page = NULL; @@ -133,13 +142,23 @@ static int mem_event_disable(struct mem_ return 0; } -void mem_event_put_request(struct domain *d, struct mem_event_domain *med, mem_event_request_t *req) +static int _mem_event_put_request(struct domain *d, + struct mem_event_domain *med, + mem_event_request_t *req) { mem_event_front_ring_t *front_ring; + int free_requests; RING_IDX req_prod; mem_event_ring_lock(med); + free_requests = RING_FREE_REQUESTS(&med->front_ring); + /* Requests from foreign domain were claimed in mem_event_check_ring() */ + if ((current->domain == d && free_requests < med->foreign_producers) || !free_requests) { + mem_event_ring_unlock(med); + return 0; + } + front_ring = &med->front_ring; req_prod = front_ring->req_prod_pvt; @@ -148,13 +167,30 @@ void mem_event_put_request(struct domain req_prod++; /* Update ring */ - med->req_producers--; + if (current->domain != d) + med->foreign_producers--; front_ring->req_prod_pvt = req_prod; RING_PUSH_REQUESTS(front_ring); mem_event_ring_unlock(med); notify_via_xen_event_channel(d, med->xen_port); + + return 1; +} + +void mem_event_put_request(struct domain *d, struct mem_event_domain *med, + mem_event_request_t *req) +{ + /* Go to sleep if request came from guest */ + if (current->domain == d) { + wait_event(med->wq, _mem_event_put_request(d, med, req)); + return; + } + /* Ring was full anyway, unable to sleep in non-guest context */ + if (!_mem_event_put_request(d, med, req)) + printk("Failed to put memreq: d %u t %x f %x gfn %lx\n", d->domain_id, + req->type, req->flags, (unsigned long)req->gfn); } void mem_event_get_response(struct mem_event_domain *med, mem_event_response_t *rsp) @@ -178,31 +214,95 @@ void mem_event_get_response(struct mem_e mem_event_ring_unlock(med); } -void mem_event_unpause_vcpus(struct domain *d) +/** + * mem_event_wake_requesters - Wake vcpus waiting for room in the ring + * @d: guest domain + * @med: mem_event ring + * + * mem_event_wake_requesters() will wakeup vcpus waiting for room in the + * ring. Only as many as can place another request in the ring will + * resume execution. + */ +void mem_event_wake_requesters(struct mem_event_domain *med) +{ + int free_requests; + + mem_event_ring_lock(med); + + free_requests = RING_FREE_REQUESTS(&med->front_ring); + if ( free_requests ) + wake_up_nr(&med->wq, free_requests); + + mem_event_ring_unlock(med); +} + +/** + * mem_event_wake_waiters - Wake all vcpus waiting for the ring + * @d: guest domain + * @med: mem_event ring + * + * mem_event_wake_waiters() will wakeup all vcpus waiting for the ring to + * become available. + */ +void mem_event_wake_waiters(struct domain *d, struct mem_event_domain *med) { struct vcpu *v; for_each_vcpu ( d, v ) - if ( test_and_clear_bit(_VPF_mem_event, &v->pause_flags) ) + if ( test_and_clear_bit(med->bit, &v->pause_flags) ) vcpu_wake(v); } -void mem_event_mark_and_pause(struct vcpu *v) +/** + * mem_event_mark_and_sleep - Put vcpu to sleep + * @v: guest vcpu + * @med: mem_event ring + * + * mem_event_mark_and_sleep() tags vcpu and put it to sleep. + * The vcpu will resume execution in mem_event_wake_waiters(). + */ +void mem_event_mark_and_sleep(struct vcpu *v, struct mem_event_domain *med) { - set_bit(_VPF_mem_event, &v->pause_flags); + set_bit(med->bit, &v->pause_flags); vcpu_sleep_nosync(v); } -void mem_event_put_req_producers(struct mem_event_domain *med) +/** + * mem_event_put_req_producers - Release a claimed slot + * @med: mem_event ring + * + * mem_event_put_req_producers() releases a claimed slot in the mem_event ring. + */ +void mem_event_put_req_producers(struct domain *d, struct mem_event_domain *med) { + if ( current->domain == d ) + return; + mem_event_ring_lock(med); - med->req_producers--; + med->foreign_producers--; mem_event_ring_unlock(med); } +/** + * mem_event_check_ring - Check state of a mem_event ring + * @d: guest domain + * @med: mem_event ring + * + * Return codes: < 0: the ring is not yet configured + * 0: the ring has some room + * > 0: the ring is full + * + * mem_event_check_ring() checks the state of the given mem_event ring. + * If the current vcpu belongs to the guest domain, the function assumes that + * mem_event_put_request() will sleep until the ring has room again. + * + * If the current vcpu does not belong to the target domain the caller must try + * again until there is room. A slot is claimed and the caller can place a + * request. If the caller does not need to send a request, the claimed slot has + * to be released with mem_event_put_req_producers(). + */ int mem_event_check_ring(struct domain *d, struct mem_event_domain *med) { - struct vcpu *curr = current; int free_requests; int ring_full = 1; @@ -212,15 +312,15 @@ int mem_event_check_ring(struct domain * mem_event_ring_lock(med); free_requests = RING_FREE_REQUESTS(&med->front_ring); - if ( med->req_producers < free_requests ) + + if ( current->domain == d ) + ring_full = 0; + else if ( med->foreign_producers < free_requests ) { - med->req_producers++; + med->foreign_producers++; ring_full = 0; } - if ( ring_full && (curr->domain == d) ) - mem_event_mark_and_pause(curr); - mem_event_ring_unlock(med); return ring_full; @@ -287,7 +387,7 @@ int mem_event_domctl(struct domain *d, x if ( p2m->pod.entry_count ) break; - rc = mem_event_enable(d, mec, med); + rc = mem_event_enable(d, mec, _VPF_mem_paging, med); } break; @@ -326,7 +426,7 @@ int mem_event_domctl(struct domain *d, x if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) break; - rc = mem_event_enable(d, mec, med); + rc = mem_event_enable(d, mec, _VPF_mem_access, med); } break; diff -r 3c4c29899d8a -r 612f69531fd1 xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -253,19 +253,11 @@ static void mem_sharing_audit(void) #endif -static struct page_info* mem_sharing_alloc_page(struct domain *d, - unsigned long gfn) +static void mem_sharing_notify_helper(struct domain *d, unsigned long gfn) { - struct page_info* page; struct vcpu *v = current; mem_event_request_t req; - page = alloc_domheap_page(d, 0); - if(page != NULL) return page; - - memset(&req, 0, sizeof(req)); - req.type = MEM_EVENT_TYPE_SHARED; - if ( v->domain != d ) { /* XXX This path needs some attention. For now, just fail foreign @@ -275,20 +267,20 @@ static struct page_info* mem_sharing_all gdprintk(XENLOG_ERR, "Failed alloc on unshare path for foreign (%d) lookup\n", d->domain_id); - return page; + return; } - vcpu_pause_nosync(v); - req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; + if (mem_event_check_ring(d, &d->mem_event->share) < 0) + return; - if(mem_event_check_ring(d, &d->mem_event->share)) return page; - + memset(&req, 0, sizeof(req)); + req.type = MEM_EVENT_TYPE_SHARED; + req.flags = MEM_EVENT_FLAG_VCPU_PAUSED; req.gfn = gfn; req.p2mt = p2m_ram_shared; req.vcpu_id = v->vcpu_id; mem_event_put_request(d, &d->mem_event->share, &req); - - return page; + vcpu_pause_nosync(v); } unsigned int mem_sharing_get_nr_saved_mfns(void) @@ -653,7 +645,7 @@ gfn_found: if(ret == 0) goto private_page_found; old_page = page; - page = mem_sharing_alloc_page(d, gfn); + page = alloc_domheap_page(d, 0); if(!page) { /* We''ve failed to obtain memory for private page. Need to re-add the @@ -661,6 +653,7 @@ gfn_found: list_add(&gfn_info->list, &hash_entry->gfns); put_gfn(d, gfn); shr_unlock(); + mem_sharing_notify_helper(d, gfn); return -ENOMEM; } diff -r 3c4c29899d8a -r 612f69531fd1 xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -882,20 +882,12 @@ int p2m_mem_paging_evict(struct domain * */ void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn) { - struct vcpu *v = current; - mem_event_request_t req; + mem_event_request_t req = { .type = MEM_EVENT_TYPE_PAGING, .gfn = gfn }; - /* Check that there''s space on the ring for this request */ - if ( mem_event_check_ring(d, &d->mem_event->paging) == 0) - { - /* Send release notification to pager */ - memset(&req, 0, sizeof(req)); - req.flags |= MEM_EVENT_FLAG_DROP_PAGE; - req.gfn = gfn; - req.vcpu_id = v->vcpu_id; + /* Send release notification to pager */ + req.flags = MEM_EVENT_FLAG_DROP_PAGE; - mem_event_put_request(d, &d->mem_event->paging, &req); - } + mem_event_put_request(d, &d->mem_event->paging, &req); } /** @@ -960,7 +952,7 @@ void p2m_mem_paging_populate(struct doma else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged ) { /* gfn is already on its way back and vcpu is not paused */ - mem_event_put_req_producers(&d->mem_event->paging); + mem_event_put_req_producers(d, &d->mem_event->paging); return; } @@ -1074,8 +1066,8 @@ void p2m_mem_paging_resume(struct domain if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) vcpu_unpause(d->vcpu[rsp.vcpu_id]); - /* Unpause any domains that were paused because the ring was full */ - mem_event_unpause_vcpus(d); + /* Wake vcpus waiting for room in the ring */ + mem_event_wake_requesters(&d->mem_event->paging); } void p2m_mem_access_check(unsigned long gpa, bool_t gla_valid, unsigned long gla, @@ -1114,7 +1106,7 @@ void p2m_mem_access_check(unsigned long "Memory access permissions failure, no mem_event listener: pausing VCPU %d, dom %d\n", v->vcpu_id, d->domain_id); - mem_event_mark_and_pause(v); + mem_event_mark_and_sleep(v, &d->mem_event->access); } else { @@ -1135,7 +1127,7 @@ void p2m_mem_access_check(unsigned long /* Pause the current VCPU unconditionally */ vcpu_pause_nosync(v); - req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; + req.flags = MEM_EVENT_FLAG_VCPU_PAUSED; /* Send request to mem event */ req.gfn = gfn; @@ -1164,9 +1156,11 @@ void p2m_mem_access_resume(struct p2m_do if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) vcpu_unpause(d->vcpu[rsp.vcpu_id]); - /* Unpause any domains that were paused because the ring was full or no listener - * was available */ - mem_event_unpause_vcpus(d); + /* Wake vcpus waiting for room in the ring */ + mem_event_wake_requesters(&d->mem_event->access); + + /* Unpause all vcpus that were paused because no listener was available */ + mem_event_wake_waiters(d, &d->mem_event->access); } diff -r 3c4c29899d8a -r 612f69531fd1 xen/include/asm-x86/mem_event.h --- a/xen/include/asm-x86/mem_event.h +++ b/xen/include/asm-x86/mem_event.h @@ -24,13 +24,13 @@ #ifndef __MEM_EVENT_H__ #define __MEM_EVENT_H__ -/* Pauses VCPU while marking pause flag for mem event */ -void mem_event_mark_and_pause(struct vcpu *v); int mem_event_check_ring(struct domain *d, struct mem_event_domain *med); -void mem_event_put_req_producers(struct mem_event_domain *med); +void mem_event_put_req_producers(struct domain *d, struct mem_event_domain *med); void mem_event_put_request(struct domain *d, struct mem_event_domain *med, mem_event_request_t *req); void mem_event_get_response(struct mem_event_domain *med, mem_event_response_t *rsp); -void mem_event_unpause_vcpus(struct domain *d); +void mem_event_wake_requesters(struct mem_event_domain *med); +void mem_event_wake_waiters(struct domain *d, struct mem_event_domain *med); +void mem_event_mark_and_sleep(struct vcpu *v, struct mem_event_domain *med); int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec, XEN_GUEST_HANDLE(void) u_domctl); diff -r 3c4c29899d8a -r 612f69531fd1 xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -14,6 +14,7 @@ #include <xen/nodemask.h> #include <xen/radix-tree.h> #include <xen/multicall.h> +#include <xen/wait.h> #include <public/xen.h> #include <public/domctl.h> #include <public/sysctl.h> @@ -183,7 +184,7 @@ struct mem_event_domain { /* ring lock */ spinlock_t ring_lock; - unsigned int req_producers; + unsigned int foreign_producers; /* shared page */ mem_event_shared_page_t *shared_page; /* shared ring page */ @@ -192,6 +193,10 @@ struct mem_event_domain mem_event_front_ring_t front_ring; /* event channel port (vcpu0 only) */ int xen_port; + /* mem_event bit for vcpu->pause_flags */ + int bit; + /* list of vcpus waiting for room in the ring */ + struct waitqueue_head wq; }; struct mem_event_per_domain @@ -615,9 +620,12 @@ static inline struct domain *next_domain /* VCPU affinity has changed: migrating to a new CPU. */ #define _VPF_migrating 3 #define VPF_migrating (1UL<<_VPF_migrating) - /* VCPU is blocked on memory-event ring. */ -#define _VPF_mem_event 4 -#define VPF_mem_event (1UL<<_VPF_mem_event) + /* VCPU is blocked due to missing mem_paging ring. */ +#define _VPF_mem_paging 4 +#define VPF_mem_paging (1UL<<_VPF_mem_paging) + /* VCPU is blocked due to missing mem_access ring. */ +#define _VPF_mem_access 5 +#define VPF_mem_access (1UL<<_VPF_mem_access) static inline int vcpu_runnable(struct vcpu *v) {
# HG changeset patch # User Olaf Hering <olaf@aepfle.de> # Date 1322737507 -3600 # Node ID 8147822efdee65d1f5b94656ab2032aedb76979f # Parent 612f69531fd15cf59c58404f6e4762733a9a268c xenpaging: use wait queues Use a wait queue to put a guest vcpu to sleep while the requested gfn is in paging state. This adds missing p2m_mem_paging_populate() calls to some callers of the new get_gfn* variants, which would crash now because they get an invalid mfn. It also fixes guest crashes due to unexpected returns from do_memory_op because copy_to/from_guest ran into a paged gfn. Now those places will always get a valid mfn. Since each gfn could be requested by several guest vcpus at the same time a queue of paged gfns is maintained. Each vcpu will be attached to that queue. Once p2m_mem_paging_resume restored the gfn the waiting vcpus will resume execution. There is untested code in p2m_mem_paging_init_queue() to allow cpu hotplug. Since each vcpu may wait on a different gfn there have to be as many queues as vcpus. But xl vcpu-set does not seem to work right now, so this code path cant be excercised right now. Signed-off-by: Olaf Hering <olaf@aepfle.de> diff -r 612f69531fd1 -r 8147822efdee xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -454,6 +454,8 @@ int hvm_domain_initialise(struct domain spin_lock_init(&d->arch.hvm_domain.irq_lock); spin_lock_init(&d->arch.hvm_domain.uc_lock); + spin_lock_init(&d->arch.hvm_domain.gfn_lock); + INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list); spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock); diff -r 612f69531fd1 -r 8147822efdee xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -30,6 +30,7 @@ #include <asm/p2m.h> #include <asm/hvm/vmx/vmx.h> /* ept_p2m_init() */ #include <xen/iommu.h> +#include <xen/wait.h> #include <asm/mem_event.h> #include <public/mem_event.h> #include <asm/mem_sharing.h> @@ -144,6 +145,182 @@ void p2m_change_entry_type_global(struct p2m_unlock(p2m); } +#ifdef __x86_64__ +struct p2m_mem_paging_queue { + struct list_head list; + struct waitqueue_head wq; + unsigned long gfn; + unsigned short waiters; + unsigned short woken; + unsigned short index; +}; + +struct p2m_mem_paging_queue_head { + struct list_head list; + unsigned int max; +}; + +int p2m_mem_paging_init_queue(struct domain *d, unsigned int max) +{ + struct p2m_mem_paging_queue_head *h; + struct p2m_mem_paging_queue *q; + unsigned int i, nr; + int ret = 0; + + spin_lock(&d->arch.hvm_domain.gfn_lock); + + if (!d->arch.hvm_domain.gfn_queue) { + ret = -ENOMEM; + h = xzalloc(struct p2m_mem_paging_queue_head); + if (!h) { + domain_crash(d); + goto out; + } + + INIT_LIST_HEAD(&h->list); + nr = max; + } else { + h = d->arch.hvm_domain.gfn_queue; + if (max <= h->max) + goto out; + nr = max - h->max; + } + + ret = -ENOMEM; + q = xzalloc_array(struct p2m_mem_paging_queue, nr); + if (!q) { + if (!d->arch.hvm_domain.gfn_queue) + xfree(h); + domain_crash(d); + goto out; + } + + for (i = 0; i < nr; i++) { + init_waitqueue_head(&q[i].wq); + INIT_LIST_HEAD(&q[i].list); + q[i].index = h->max + i + 1; + list_add_tail(&q[i].list, &h->list); + } + + h->max = max; + d->arch.hvm_domain.gfn_queue = h; + ret = 0; + +out: + spin_unlock(&d->arch.hvm_domain.gfn_lock); + return ret; +} + +static struct p2m_mem_paging_queue *p2m_mem_paging_get_queue(struct domain *d, unsigned long gfn) +{ + struct p2m_mem_paging_queue_head *h; + struct p2m_mem_paging_queue *q, *q_match, *q_free; + + h = d->arch.hvm_domain.gfn_queue; + q_match = q_free = NULL; + + spin_lock(&d->arch.hvm_domain.gfn_lock); + + list_for_each_entry(q, &h->list, list) { + if (q->gfn == gfn) { + q_match = q; + break; + } + if (!q_free && !q->waiters) + q_free = q; + } + + if (!q_match && q_free) + q_match = q_free; + + if (q_match) { + if (q_match->woken) + printk("wq woken for gfn %u:%u %lx %u %u %u\n", current->domain->domain_id, current->vcpu_id, gfn, q_match->index, q_match->woken, q_match->waiters); + q_match->waiters++; + q_match->gfn = gfn; + } + + if (!q_match) + printk("No wq_get for gfn %u:%u %lx\n", current->domain->domain_id, current->vcpu_id, gfn); + + spin_unlock(&d->arch.hvm_domain.gfn_lock); + return q_match; +} + +static void p2m_mem_paging_put_queue(struct domain *d, struct p2m_mem_paging_queue *q_match) +{ + spin_lock(&d->arch.hvm_domain.gfn_lock); + + if (q_match->waiters == 0) + printk("wq_put no waiters, gfn %u:%u %lx %u\n", current->domain->domain_id, current->vcpu_id, q_match->gfn, q_match->woken); + else if (--q_match->waiters == 0) + q_match->gfn = q_match->woken = 0;; + + spin_unlock(&d->arch.hvm_domain.gfn_lock); +} + +static void p2m_mem_paging_wake_queue(struct domain *d, unsigned long gfn) +{ + struct p2m_mem_paging_queue_head *h; + struct p2m_mem_paging_queue *q, *q_match = NULL; + + spin_lock(&d->arch.hvm_domain.gfn_lock); + + h = d->arch.hvm_domain.gfn_queue; + list_for_each_entry(q, &h->list, list) { + if (q->gfn == gfn) { + q_match = q; + break; + } + } + if (q_match) { + if (q_match->woken || q_match->waiters == 0) + printk("Wrong wake for gfn %u:%u %p %lx %u %u\n", current->domain->domain_id, current->vcpu_id, q_match, gfn, q_match->woken, q_match->waiters); + q_match->woken++; + wake_up_all(&q_match->wq); + } + spin_unlock(&d->arch.hvm_domain.gfn_lock); +} + +/* Returns 0 if the gfn is still paged */ +static int p2m_mem_paging_get_entry(mfn_t *mfn, + struct p2m_domain *p2m, unsigned long gfn, + p2m_type_t *t, p2m_access_t *a, p2m_query_t q, + unsigned int *page_order) +{ + *mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); + + return p2m_is_paging(*t) ? 0 : 1; +} + +/* Go to sleep in case of guest access */ +static void p2m_mem_paging_wait(mfn_t *mfn, + struct p2m_domain *p2m, unsigned long gfn, + p2m_type_t *t, p2m_access_t *a, p2m_query_t q, + unsigned int *page_order) +{ + struct p2m_mem_paging_queue *pmpq; + + /* Return p2mt as is in case of query */ + if ( q == p2m_query ) + return; + /* Foreign domains can not go to sleep */ + if ( current->domain != p2m->domain ) + return; + + pmpq = p2m_mem_paging_get_queue(p2m->domain, gfn); + if ( !pmpq ) + return; + + /* Populate the page once */ + if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged ) + p2m_mem_paging_populate(p2m->domain, gfn); + + wait_event(pmpq->wq, p2m_mem_paging_get_entry(mfn, p2m, gfn, t, a, q, page_order)); + p2m_mem_paging_put_queue(p2m->domain, pmpq); +} +#endif + mfn_t get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn, p2m_type_t *t, p2m_access_t *a, p2m_query_t q, unsigned int *page_order) @@ -161,6 +338,11 @@ mfn_t get_gfn_type_access(struct p2m_dom mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); #ifdef __x86_64__ + if ( unlikely(p2m_is_paging(*t)) ) + p2m_mem_paging_wait(&mfn, p2m, gfn, t, a, q, page_order); +#endif + +#ifdef __x86_64__ if ( q == p2m_unshare && p2m_is_shared(*t) ) { ASSERT(!p2m_is_nestedp2m(p2m)); @@ -914,54 +1096,42 @@ void p2m_mem_paging_drop_page(struct dom void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) { struct vcpu *v = current; - mem_event_request_t req; + mem_event_request_t req = { .type = MEM_EVENT_TYPE_PAGING, .gfn = gfn }; p2m_type_t p2mt; p2m_access_t a; mfn_t mfn; struct p2m_domain *p2m = p2m_get_hostp2m(d); + int put_request = 0; /* Check that there''s space on the ring for this request */ if ( mem_event_check_ring(d, &d->mem_event->paging) ) return; - memset(&req, 0, sizeof(req)); - req.type = MEM_EVENT_TYPE_PAGING; - /* Fix p2m mapping */ p2m_lock(p2m); mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL); - /* Allow only nominated or evicted pages to enter page-in path */ - if ( p2mt == p2m_ram_paging_out || p2mt == p2m_ram_paged ) - { - /* Evict will fail now, tag this request for pager */ - if ( p2mt == p2m_ram_paging_out ) - req.flags |= MEM_EVENT_FLAG_EVICT_FAIL; + /* Forward the state only if gfn is in page-out path */ + if ( p2mt == p2m_ram_paging_out || p2mt == p2m_ram_paged ) { + /* Ignore foreign requests to allow mmap in pager */ + if ( mfn_valid(mfn) && p2mt == p2m_ram_paging_out && v->domain == d ) { + /* Restore gfn because it is needed by guest before evict */ + set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, + paging_mode_log_dirty(d) ? p2m_ram_logdirty : p2m_ram_rw, a); + } else { + set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_paging_in_start, a); + put_request = 1; + } + /* Evict will fail now, the pager has to try another gfn */ - set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_paging_in_start, a); audit_p2m(p2m, 1); } p2m_unlock(p2m); - /* Pause domain if request came from guest and gfn has paging type */ - if ( p2m_is_paging(p2mt) && v->domain == d ) - { - vcpu_pause_nosync(v); - req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; - } - /* No need to inform pager if the gfn is not in the page-out path */ - else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged ) - { - /* gfn is already on its way back and vcpu is not paused */ + /* One request per gfn, guest vcpus go to sleep, foreigners try again */ + if ( put_request ) + mem_event_put_request(d, &d->mem_event->paging, &req); + else mem_event_put_req_producers(d, &d->mem_event->paging); - return; - } - - /* Send request to pager */ - req.gfn = gfn; - req.p2mt = p2mt; - req.vcpu_id = v->vcpu_id; - - mem_event_put_request(d, &d->mem_event->paging, &req); } /** @@ -1062,12 +1232,11 @@ void p2m_mem_paging_resume(struct domain p2m_unlock(p2m); } - /* Unpause domain */ - if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) - vcpu_unpause(d->vcpu[rsp.vcpu_id]); - /* Wake vcpus waiting for room in the ring */ mem_event_wake_requesters(&d->mem_event->paging); + + /* Unpause all vcpus that were paused because the gfn was paged */ + p2m_mem_paging_wake_queue(d, rsp.gfn); } void p2m_mem_access_check(unsigned long gpa, bool_t gla_valid, unsigned long gla, diff -r 612f69531fd1 -r 8147822efdee xen/common/domctl.c --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -547,6 +547,9 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc goto maxvcpu_out; } + if ( p2m_mem_paging_init_queue(d, max) ) + goto maxvcpu_out; + ret = 0; maxvcpu_out: diff -r 612f69531fd1 -r 8147822efdee xen/include/asm-x86/hvm/domain.h --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -91,6 +91,9 @@ struct hvm_domain { struct viridian_domain viridian; + spinlock_t gfn_lock; + struct p2m_mem_paging_queue_head *gfn_queue; + bool_t hap_enabled; bool_t mem_sharing_enabled; bool_t qemu_mapcache_invalidate; diff -r 612f69531fd1 -r 8147822efdee xen/include/asm-x86/p2m.h --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -468,6 +468,8 @@ p2m_pod_offline_or_broken_replace(struct /* Modify p2m table for shared gfn */ int set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn); +/* Initialize per-gfn wait queue */ +int p2m_mem_paging_init_queue(struct domain *d, unsigned int max); /* Check if a nominated gfn is valid to be paged out */ int p2m_mem_paging_nominate(struct domain *d, unsigned long gfn); /* Evict a frame */
Olaf Hering
2011-Dec-01 11:09 UTC
[PATCH 3 of 4] xenpaging: add need_populate and paged_no_mfn checks
# HG changeset patch # User Olaf Hering <olaf@aepfle.de> # Date 1322737586 -3600 # Node ID c09ac3717a025a8ead44bbc795fedda715d134c7 # Parent 8147822efdee65d1f5b94656ab2032aedb76979f xenpaging: add need_populate and paged_no_mfn checks There is currently a mix of p2mt checks for the various paging types. Some mean the p2mt needs to be populated, others mean a gfn without mfn. Add a new p2m_do_populate() helper which covers the p2m_ram_paged and p2m_ram_paging_out types. If a gfn is not in these states anymore another populate request for the pager is not needed. This avoids a call to p2m_mem_paging_populate() which in turn reduces the pressure on the ring buffer because no temporary slot needs to be claimed. As such, this helper is an optimization. Modify the existing p2m_is_paged() helper which now covers also p2m_ram_paging_in_start in addition to the current p2m_ram_paged type. A gfn in these two states is not backed by a mfn. Signed-off-by: Olaf Hering <olaf@aepfle.de> diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/hvm/emulate.c --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -66,7 +66,8 @@ static int hvmemul_do_io( ram_mfn = get_gfn_unshare(curr->domain, ram_gfn, &p2mt); if ( p2m_is_paging(p2mt) ) { - p2m_mem_paging_populate(curr->domain, ram_gfn); + if ( p2m_do_populate(p2mt) ) + p2m_mem_paging_populate(curr->domain, ram_gfn); put_gfn(curr->domain, ram_gfn); return X86EMUL_RETRY; } diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -363,7 +363,8 @@ static int hvm_set_ioreq_page( } if ( p2m_is_paging(p2mt) ) { - p2m_mem_paging_populate(d, gmfn); + if ( p2m_do_populate(p2mt) ) + p2m_mem_paging_populate(d, gmfn); put_gfn(d, gmfn); return -ENOENT; } @@ -1300,7 +1301,7 @@ int hvm_hap_nested_page_fault(unsigned l #ifdef __x86_64__ /* Check if the page has been paged out */ - if ( p2m_is_paged(p2mt) || (p2mt == p2m_ram_paging_out) ) + if ( p2m_do_populate(p2mt) ) p2m_mem_paging_populate(v->domain, gfn); /* Mem sharing: unshare the page and try again */ @@ -1820,7 +1821,8 @@ static void *__hvm_map_guest_frame(unsig } if ( p2m_is_paging(p2mt) ) { - p2m_mem_paging_populate(d, gfn); + if ( p2m_do_populate(p2mt) ) + p2m_mem_paging_populate(d, gfn); put_gfn(d, gfn); return NULL; } @@ -2280,7 +2282,8 @@ static enum hvm_copy_result __hvm_copy( if ( p2m_is_paging(p2mt) ) { - p2m_mem_paging_populate(curr->domain, gfn); + if ( p2m_do_populate(p2mt) ) + p2m_mem_paging_populate(curr->domain, gfn); put_gfn(curr->domain, gfn); return HVMCOPY_gfn_paged_out; } @@ -3760,7 +3763,8 @@ long do_hvm_op(unsigned long op, XEN_GUE mfn_t mfn = get_gfn_unshare(d, pfn, &t); if ( p2m_is_paging(t) ) { - p2m_mem_paging_populate(d, pfn); + if ( p2m_do_populate(t) ) + p2m_mem_paging_populate(d, pfn); put_gfn(d, pfn); rc = -EINVAL; goto param_fail3; @@ -3864,7 +3868,8 @@ long do_hvm_op(unsigned long op, XEN_GUE mfn = get_gfn_unshare(d, pfn, &t); if ( p2m_is_paging(t) ) { - p2m_mem_paging_populate(d, pfn); + if ( p2m_do_populate(t) ) + p2m_mem_paging_populate(d, pfn); put_gfn(d, pfn); rc = -EINVAL; goto param_fail4; diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/mm.c --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -3525,9 +3525,10 @@ int do_mmu_update( if ( !p2m_is_valid(p2mt) ) mfn = INVALID_MFN; - if ( p2m_is_paged(p2mt) ) + if ( p2m_is_paged(p2mt) && !mfn_valid(mfn) ) { - p2m_mem_paging_populate(pg_owner, gmfn); + if ( p2m_do_populate(p2mt) ) + p2m_mem_paging_populate(pg_owner, gmfn); put_gfn(pt_owner, gmfn); rc = -ENOENT; break; @@ -3565,15 +3566,10 @@ int do_mmu_update( l1emfn = mfn_x(get_gfn(pg_owner, l1egfn, &l1e_p2mt)); - if ( p2m_is_paged(l1e_p2mt) ) + if ( p2m_is_paged(l1e_p2mt) && !mfn_valid(l1emfn) ) { - p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e)); - put_gfn(pg_owner, l1egfn); - rc = -ENOENT; - break; - } - else if ( p2m_ram_paging_in_start == l1e_p2mt && !mfn_valid(mfn) ) - { + if ( p2m_do_populate(l1e_p2mt) ) + p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e)); put_gfn(pg_owner, l1egfn); rc = -ENOENT; break; @@ -3613,15 +3609,10 @@ int do_mmu_update( l2emfn = mfn_x(get_gfn(pg_owner, l2egfn, &l2e_p2mt)); - if ( p2m_is_paged(l2e_p2mt) ) + if ( p2m_is_paged(l2e_p2mt) && !mfn_valid(l2emfn) ) { - p2m_mem_paging_populate(pg_owner, l2egfn); - put_gfn(pg_owner, l2egfn); - rc = -ENOENT; - break; - } - else if ( p2m_ram_paging_in_start == l2e_p2mt && !mfn_valid(mfn) ) - { + if ( p2m_do_populate(l2e_p2mt) ) + p2m_mem_paging_populate(pg_owner, l2egfn); put_gfn(pg_owner, l2egfn); rc = -ENOENT; break; @@ -3647,15 +3638,10 @@ int do_mmu_update( l3emfn = mfn_x(get_gfn(pg_owner, l3egfn, &l3e_p2mt)); - if ( p2m_is_paged(l3e_p2mt) ) + if ( p2m_is_paged(l3e_p2mt) && !mfn_valid(l3emfn) ) { - p2m_mem_paging_populate(pg_owner, l3egfn); - put_gfn(pg_owner, l3egfn); - rc = -ENOENT; - break; - } - else if ( p2m_ram_paging_in_start == l3e_p2mt && !mfn_valid(mfn) ) - { + if ( p2m_do_populate(l3e_p2mt) ) + p2m_mem_paging_populate(pg_owner, l3egfn); put_gfn(pg_owner, l3egfn); rc = -ENOENT; break; @@ -3681,15 +3667,10 @@ int do_mmu_update( l4emfn = mfn_x(get_gfn(pg_owner, l4egfn, &l4e_p2mt)); - if ( p2m_is_paged(l4e_p2mt) ) + if ( p2m_is_paged(l4e_p2mt) && !mfn_valid(l4emfn) ) { - p2m_mem_paging_populate(pg_owner, l4egfn); - put_gfn(pg_owner, l4egfn); - rc = -ENOENT; - break; - } - else if ( p2m_ram_paging_in_start == l4e_p2mt && !mfn_valid(mfn) ) - { + if ( p2m_do_populate(l4e_p2mt) ) + p2m_mem_paging_populate(pg_owner, l4egfn); put_gfn(pg_owner, l4egfn); rc = -ENOENT; break; diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/mm/guest_walk.c --- a/xen/arch/x86/mm/guest_walk.c +++ b/xen/arch/x86/mm/guest_walk.c @@ -102,7 +102,8 @@ static inline void *map_domain_gfn(struc if ( p2m_is_paging(*p2mt) ) { ASSERT(!p2m_is_nestedp2m(p2m)); - p2m_mem_paging_populate(p2m->domain, gfn_x(gfn)); + if ( p2m_do_populate(*p2mt) ) + p2m_mem_paging_populate(p2m->domain, gfn_x(gfn)); __put_gfn(p2m, gfn_x(gfn)); *rc = _PAGE_PAGED; return NULL; diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/mm/hap/guest_walk.c --- a/xen/arch/x86/mm/hap/guest_walk.c +++ b/xen/arch/x86/mm/hap/guest_walk.c @@ -64,7 +64,8 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA if ( p2m_is_paging(p2mt) ) { ASSERT(!p2m_is_nestedp2m(p2m)); - p2m_mem_paging_populate(p2m->domain, cr3 >> PAGE_SHIFT); + if ( p2m_do_populate(p2mt) ) + p2m_mem_paging_populate(p2m->domain, cr3 >> PAGE_SHIFT); pfec[0] = PFEC_page_paged; __put_gfn(p2m, top_gfn); @@ -101,7 +102,8 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA if ( p2m_is_paging(p2mt) ) { ASSERT(!p2m_is_nestedp2m(p2m)); - p2m_mem_paging_populate(p2m->domain, gfn_x(gfn)); + if ( p2m_do_populate(p2mt) ) + p2m_mem_paging_populate(p2m->domain, gfn_x(gfn)); pfec[0] = PFEC_page_paged; __put_gfn(p2m, gfn_x(gfn)); diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/mm/p2m-ept.c --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -375,8 +375,7 @@ ept_set_entry(struct p2m_domain *p2m, un * Read-then-write is OK because we hold the p2m lock. */ old_entry = *ept_entry; - if ( mfn_valid(mfn_x(mfn)) || direct_mmio || p2m_is_paged(p2mt) || - (p2mt == p2m_ram_paging_in_start) ) + if ( mfn_valid(mfn_x(mfn)) || direct_mmio || p2m_is_paged(p2mt) ) { /* Construct the new entry, and then write it once */ new_entry.emt = epte_get_entry_emt(p2m->domain, gfn, mfn, &ipat, diff -r 8147822efdee -r c09ac3717a02 xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -313,7 +313,7 @@ static void p2m_mem_paging_wait(mfn_t *m return; /* Populate the page once */ - if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged ) + if ( p2m_do_populate(*t) ) p2m_mem_paging_populate(p2m->domain, gfn); wait_event(pmpq->wq, p2m_mem_paging_get_entry(mfn, p2m, gfn, t, a, q, page_order)); @@ -1111,7 +1111,7 @@ void p2m_mem_paging_populate(struct doma p2m_lock(p2m); mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL); /* Forward the state only if gfn is in page-out path */ - if ( p2mt == p2m_ram_paging_out || p2mt == p2m_ram_paged ) { + if ( p2m_do_populate(p2mt) ) { /* Ignore foreign requests to allow mmap in pager */ if ( mfn_valid(mfn) && p2mt == p2m_ram_paging_out && v->domain == d ) { /* Restore gfn because it is needed by guest before evict */ diff -r 8147822efdee -r c09ac3717a02 xen/common/grant_table.c --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -163,7 +163,8 @@ static int __get_paged_frame(unsigned lo *frame = mfn_x(mfn); if ( p2m_is_paging(p2mt) ) { - p2m_mem_paging_populate(rd, gfn); + if ( p2m_do_populate(p2mt) ) + p2m_mem_paging_populate(rd, gfn); put_gfn(rd, gfn); rc = GNTST_eagain; } diff -r 8147822efdee -r c09ac3717a02 xen/include/asm-x86/p2m.h --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -158,7 +158,11 @@ typedef enum { | p2m_to_mask(p2m_ram_paging_in_start) \ | p2m_to_mask(p2m_ram_paging_in)) -#define P2M_PAGED_TYPES (p2m_to_mask(p2m_ram_paged)) +#define P2M_POPULATE_TYPES (p2m_to_mask(p2m_ram_paged) \ + | p2m_to_mask(p2m_ram_paging_out) ) + +#define P2M_PAGED_NO_MFN_TYPES (p2m_to_mask(p2m_ram_paged) \ + | p2m_to_mask(p2m_ram_paging_in_start) ) /* Shared types */ /* XXX: Sharable types could include p2m_ram_ro too, but we would need to @@ -183,7 +187,8 @@ typedef enum { #define p2m_has_emt(_t) (p2m_to_mask(_t) & (P2M_RAM_TYPES | p2m_to_mask(p2m_mmio_direct))) #define p2m_is_pageable(_t) (p2m_to_mask(_t) & P2M_PAGEABLE_TYPES) #define p2m_is_paging(_t) (p2m_to_mask(_t) & P2M_PAGING_TYPES) -#define p2m_is_paged(_t) (p2m_to_mask(_t) & P2M_PAGED_TYPES) +#define p2m_do_populate(_t) (p2m_to_mask(_t) & P2M_POPULATE_TYPES) +#define p2m_is_paged(_t) (p2m_to_mask(_t) & P2M_PAGED_NO_MFN_TYPES) #define p2m_is_sharable(_t) (p2m_to_mask(_t) & P2M_SHARABLE_TYPES) #define p2m_is_shared(_t) (p2m_to_mask(_t) & P2M_SHARED_TYPES) #define p2m_is_broken(_t) (p2m_to_mask(_t) & P2M_BROKEN_TYPES)
Olaf Hering
2011-Dec-01 11:09 UTC
[PATCH 4 of 4] Add tracing to mem_event and p2m_mem_paging functions
# HG changeset patch # User Olaf Hering <olaf@aepfle.de> # Date 1322737677 -3600 # Node ID 0bf827a48a3c9efd43fe04c291b553d2bec46f8b # Parent c09ac3717a025a8ead44bbc795fedda715d134c7 Add tracing to mem_event and p2m_mem_paging functions Maintaining these trace_var calls out-of-tree became a burden and made debugging paging related issues harder. So putting it into the tree will ease debugging with xenalyze. Signed-off-by: Olaf Hering <olaf@aepfle.de> diff -r c09ac3717a02 -r 0bf827a48a3c xen/arch/x86/mm/mem_event.c --- a/xen/arch/x86/mm/mem_event.c +++ b/xen/arch/x86/mm/mem_event.c @@ -24,6 +24,7 @@ #include <asm/domain.h> #include <xen/event.h> #include <xen/wait.h> +#include <xen/trace.h> #include <asm/p2m.h> #include <asm/mem_event.h> #include <asm/mem_paging.h> @@ -149,6 +150,13 @@ static int _mem_event_put_request(struct mem_event_front_ring_t *front_ring; int free_requests; RING_IDX req_prod; + struct trc_mem_event_put_request T = { .td = d->domain_id }; + int ret = 0; + + T.d = current->domain->domain_id; + T.v = current->vcpu_id; + T.ring_bit = med->bit; + T.gfn = req->gfn; mem_event_ring_lock(med); @@ -156,7 +164,7 @@ static int _mem_event_put_request(struct /* Requests from foreign domain were claimed in mem_event_check_ring() */ if ((current->domain == d && free_requests < med->foreign_producers) || !free_requests) { mem_event_ring_unlock(med); - return 0; + goto out; } front_ring = &med->front_ring; @@ -176,7 +184,15 @@ static int _mem_event_put_request(struct notify_via_xen_event_channel(d, med->xen_port); - return 1; + ret = 1; + +out: + T.ret = ret; + T.room = free_requests; + T.foreign_producers = med->foreign_producers + ret; + trace_var(TRC_MEM_EVENT_PUT_REQUEST, 1, sizeof(T), &T); + + return ret; } void mem_event_put_request(struct domain *d, struct mem_event_domain *med, diff -r c09ac3717a02 -r 0bf827a48a3c xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -31,6 +31,7 @@ #include <asm/hvm/vmx/vmx.h> /* ept_p2m_init() */ #include <xen/iommu.h> #include <xen/wait.h> +#include <xen/trace.h> #include <asm/mem_event.h> #include <public/mem_event.h> #include <asm/mem_sharing.h> @@ -215,7 +216,12 @@ static struct p2m_mem_paging_queue *p2m_ { struct p2m_mem_paging_queue_head *h; struct p2m_mem_paging_queue *q, *q_match, *q_free; + struct trc_mem_p2m_paging_queue T = { .td = d->domain_id }; + T.d = current->domain->domain_id; + T.v = current->vcpu_id; + T.gfn = gfn; + h = d->arch.hvm_domain.gfn_queue; q_match = q_free = NULL; @@ -238,19 +244,37 @@ static struct p2m_mem_paging_queue *p2m_ printk("wq woken for gfn %u:%u %lx %u %u %u\n", current->domain->domain_id, current->vcpu_id, gfn, q_match->index, q_match->woken, q_match->waiters); q_match->waiters++; q_match->gfn = gfn; + + T.waiters = q_match->waiters; + T.woken = q_match->woken; + T.index = q_match->index; } + trace_var(TRC_MEM_P2M_PAGING_GET_QUEUE, 1, sizeof(T), &T); + if (!q_match) printk("No wq_get for gfn %u:%u %lx\n", current->domain->domain_id, current->vcpu_id, gfn); + spin_unlock(&d->arch.hvm_domain.gfn_lock); return q_match; } static void p2m_mem_paging_put_queue(struct domain *d, struct p2m_mem_paging_queue *q_match) { + struct trc_mem_p2m_paging_queue T = { .td = d->domain_id }; + + T.d = current->domain->domain_id; + T.v = current->vcpu_id; + T.gfn = q_match->gfn; + T.waiters = q_match->waiters; + T.woken = q_match->woken; + T.index = q_match->index; + spin_lock(&d->arch.hvm_domain.gfn_lock); + trace_var(TRC_MEM_P2M_PAGING_PUT_QUEUE, 1, sizeof(T), &T); + if (q_match->waiters == 0) printk("wq_put no waiters, gfn %u:%u %lx %u\n", current->domain->domain_id, current->vcpu_id, q_match->gfn, q_match->woken); else if (--q_match->waiters == 0) @@ -263,6 +287,11 @@ static void p2m_mem_paging_wake_queue(st { struct p2m_mem_paging_queue_head *h; struct p2m_mem_paging_queue *q, *q_match = NULL; + struct trc_mem_p2m_paging_queue T = { .td = d->domain_id }; + + T.d = current->domain->domain_id; + T.v = current->vcpu_id; + T.gfn = gfn; spin_lock(&d->arch.hvm_domain.gfn_lock); @@ -277,8 +306,15 @@ static void p2m_mem_paging_wake_queue(st if (q_match->woken || q_match->waiters == 0) printk("Wrong wake for gfn %u:%u %p %lx %u %u\n", current->domain->domain_id, current->vcpu_id, q_match, gfn, q_match->woken, q_match->waiters); q_match->woken++; + + T.waiters = q_match->waiters; + T.woken = q_match->woken; + T.index = q_match->index; + wake_up_all(&q_match->wq); } + trace_var(TRC_MEM_P2M_PAGING_WAKE_QUEUE, 1, sizeof(T), &T); + spin_unlock(&d->arch.hvm_domain.gfn_lock); } @@ -937,34 +973,45 @@ int p2m_mem_paging_nominate(struct domai p2m_access_t a; mfn_t mfn; int ret; + struct trc_mem_p2m_paging T = { .td = d->domain_id }; + + T.d = current->domain->domain_id; + T.v = current->vcpu_id; + T.gfn = gfn; p2m_lock(p2m); mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL); + T.reason++; /* Check if mfn is valid */ ret = -EINVAL; if ( !mfn_valid(mfn) ) goto out; + T.reason++; /* Check p2m type */ ret = -EAGAIN; if ( !p2m_is_pageable(p2mt) ) goto out; + T.reason++; /* Check for io memory page */ if ( is_iomem_page(mfn_x(mfn)) ) goto out; + T.reason++; /* Check page count and type */ page = mfn_to_page(mfn); if ( (page->count_info & (PGC_count_mask | PGC_allocated)) ! (1 | PGC_allocated) ) goto out; + T.reason++; if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_none ) goto out; + T.reason++; /* Fix p2m entry */ set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_paging_out, a); audit_p2m(p2m, 1); @@ -972,6 +1019,10 @@ int p2m_mem_paging_nominate(struct domai out: p2m_unlock(p2m); + + T.mfn = mfn_x(mfn); + T.p2mt = p2mt; + trace_var(TRC_MEM_P2M_PAGING_NOMINATE, 1, sizeof(T), &T); return ret; } @@ -1002,6 +1053,11 @@ int p2m_mem_paging_evict(struct domain * mfn_t mfn; struct p2m_domain *p2m = p2m_get_hostp2m(d); int ret = -EINVAL; + struct trc_mem_p2m_paging T = { .td = d->domain_id }; + + T.d = current->domain->domain_id; + T.v = current->vcpu_id; + T.gfn = gfn; p2m_lock(p2m); @@ -1010,24 +1066,29 @@ int p2m_mem_paging_evict(struct domain * if ( unlikely(!mfn_valid(mfn)) ) goto out; + T.reason++; /* Allow only nominated pages */ if ( p2mt != p2m_ram_paging_out ) goto out; + T.reason++; ret = -EBUSY; /* Get the page so it doesn''t get modified under Xen''s feet */ page = mfn_to_page(mfn); if ( unlikely(!get_page(page, d)) ) goto out; + T.reason++; /* Check page count and type once more */ if ( (page->count_info & (PGC_count_mask | PGC_allocated)) ! (2 | PGC_allocated) ) goto out_put; + T.reason++; if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_none ) goto out_put; + T.reason++; /* Decrement guest domain''s ref count of the page */ if ( test_and_clear_bit(_PGC_allocated, &page->count_info) ) put_page(page); @@ -1050,6 +1111,12 @@ int p2m_mem_paging_evict(struct domain * out: p2m_unlock(p2m); + + T.flag_evict_fail = !!ret; + T.mfn = mfn_x(mfn); + T.p2mt = p2mt; + trace_var(TRC_MEM_P2M_PAGING_EVICT, 1, sizeof(T), &T); + return ret; } @@ -1065,10 +1132,16 @@ int p2m_mem_paging_evict(struct domain * void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn) { mem_event_request_t req = { .type = MEM_EVENT_TYPE_PAGING, .gfn = gfn }; + struct trc_mem_p2m_paging T = { .td = d->domain_id }; /* Send release notification to pager */ req.flags = MEM_EVENT_FLAG_DROP_PAGE; + T.d = current->domain->domain_id; + T.v = current->vcpu_id; + T.gfn = gfn; + trace_var(TRC_MEM_P2M_PAGING_DROP, 1, sizeof(T), &T); + mem_event_put_request(d, &d->mem_event->paging, &req); } @@ -1101,11 +1174,13 @@ void p2m_mem_paging_populate(struct doma p2m_access_t a; mfn_t mfn; struct p2m_domain *p2m = p2m_get_hostp2m(d); - int put_request = 0; + int put_request = 0, ring_full; + struct trc_mem_p2m_paging T = { .td = d->domain_id }; /* Check that there''s space on the ring for this request */ - if ( mem_event_check_ring(d, &d->mem_event->paging) ) - return; + ring_full = mem_event_check_ring(d, &d->mem_event->paging); + if ( ring_full ) + goto trace; /* Fix p2m mapping */ p2m_lock(p2m); @@ -1127,11 +1202,24 @@ void p2m_mem_paging_populate(struct doma } p2m_unlock(p2m); - /* One request per gfn, guest vcpus go to sleep, foreigners try again */ - if ( put_request ) - mem_event_put_request(d, &d->mem_event->paging, &req); - else - mem_event_put_req_producers(d, &d->mem_event->paging); + T.mfn = mfn_x(mfn); + T.p2mt = p2mt; + +trace: + T.d = current->domain->domain_id; + T.v = current->vcpu_id; + T.gfn = gfn; + T.reason = ring_full; + T.flag_drop_page = put_request; + trace_var(TRC_MEM_P2M_PAGING_POPULATE, 1, sizeof(T), &T); + + if ( !ring_full ) { + /* One request per gfn, guest vcpus go to sleep, foreigners try again */ + if ( put_request ) + mem_event_put_request(d, &d->mem_event->paging, &req); + else + mem_event_put_req_producers(d, &d->mem_event->paging); + } } /** @@ -1153,6 +1241,7 @@ int p2m_mem_paging_prep(struct domain *d mfn_t mfn; struct p2m_domain *p2m = p2m_get_hostp2m(d); int ret; + struct trc_mem_p2m_paging T = { .td = d->domain_id }; p2m_lock(p2m); @@ -1184,6 +1273,15 @@ int p2m_mem_paging_prep(struct domain *d out: p2m_unlock(p2m); + + T.d = current->domain->domain_id; + T.v = current->vcpu_id; + T.gfn = gfn; + T.mfn = mfn_x(mfn); + T.p2mt = p2mt; + T.reason = ret; + trace_var(TRC_MEM_P2M_PAGING_PREP, 1, sizeof(T), &T); + return ret; } @@ -1209,6 +1307,7 @@ void p2m_mem_paging_resume(struct domain p2m_type_t p2mt; p2m_access_t a; mfn_t mfn; + struct trc_mem_p2m_paging T = { .td = d->domain_id }; /* Pull the response off the ring */ mem_event_get_response(&d->mem_event->paging, &rsp); @@ -1230,8 +1329,16 @@ void p2m_mem_paging_resume(struct domain audit_p2m(p2m, 1); } p2m_unlock(p2m); + T.mfn = mfn_x(mfn); + T.p2mt = p2mt; } + T.d = current->domain->domain_id; + T.v = current->vcpu_id; + T.gfn = rsp.gfn; + T.flag_drop_page = !!(rsp.flags & MEM_EVENT_FLAG_DROP_PAGE); + trace_var(TRC_MEM_P2M_PAGING_RESUME, 1, sizeof(T), &T); + /* Wake vcpus waiting for room in the ring */ mem_event_wake_requesters(&d->mem_event->paging); diff -r c09ac3717a02 -r 0bf827a48a3c xen/include/public/trace.h --- a/xen/include/public/trace.h +++ b/xen/include/public/trace.h @@ -94,6 +94,37 @@ #define TRC_MEM_POD_ZERO_RECLAIM (TRC_MEM + 17) #define TRC_MEM_POD_SUPERPAGE_SPLINTER (TRC_MEM + 18) +#define TRC_MEM_EVENT_PUT_REQUEST (TRC_MEM + 19) +struct trc_mem_event_put_request { + unsigned int d:16, v:16; + unsigned int td:16, room:16; + unsigned int gfn; + unsigned int foreign_producers:16, ret:1, ring_bit:5; +} __attribute__((packed)); + +#define TRC_MEM_P2M_PAGING_NOMINATE (TRC_MEM + 20) +#define TRC_MEM_P2M_PAGING_EVICT (TRC_MEM + 21) +#define TRC_MEM_P2M_PAGING_DROP (TRC_MEM + 22) +#define TRC_MEM_P2M_PAGING_POPULATE (TRC_MEM + 23) +#define TRC_MEM_P2M_PAGING_PREP (TRC_MEM + 24) +#define TRC_MEM_P2M_PAGING_RESUME (TRC_MEM + 25) +struct trc_mem_p2m_paging { + unsigned int d:16, v:16; + unsigned int td:16, p2mt:5, reason:5, flag_evict_fail:1, flag_drop_page:1; + unsigned int gfn; + unsigned int mfn; +} __attribute__((packed)); + +#define TRC_MEM_P2M_PAGING_GET_QUEUE (TRC_MEM + 26) +#define TRC_MEM_P2M_PAGING_PUT_QUEUE (TRC_MEM + 27) +#define TRC_MEM_P2M_PAGING_WAKE_QUEUE (TRC_MEM + 28) +struct trc_mem_p2m_paging_queue { + unsigned int d:16, v:16; + unsigned int td:16, woken:8; + unsigned int gfn; + unsigned int index:16, waiters:16; +} __attribute__((packed)); + #define TRC_PV_HYPERCALL (TRC_PV + 1) #define TRC_PV_TRAP (TRC_PV + 3)
At 12:09 +0100 on 01 Dec (1322741356), Olaf Hering wrote:> > The following series is my current state of an attempt to use wait queues for > xenpaging and mem_event handling. I post it here for review and comments. > > This series conflicts with work from Andres Lagar-Cavilla.Can you be clear about which of Andres''s patchsets you are objecting to? I have five series (!) of his on my todo list right now. Tim.
On Thu, Dec 01, Tim Deegan wrote:> At 12:09 +0100 on 01 Dec (1322741356), Olaf Hering wrote: > > > > The following series is my current state of an attempt to use wait queues for > > xenpaging and mem_event handling. I post it here for review and comments. > > > > This series conflicts with work from Andres Lagar-Cavilla. > > Can you be clear about which of Andres''s patchsets you are objecting to? > I have five series (!) of his on my todo list right now.The "Improve ring management for memory events" change which tries to achieve the same result as my "mem_event: use wait queue when ring is full" change. There is already a discussion about which way to choose. Olaf
Hi, Is there a more recent version of this patch? I see a newer version of the wait-queues-for-mem-events one but not of this. At 12:09 +0100 on 01 Dec (1322741358), Olaf Hering wrote:> # HG changeset patch > # User Olaf Hering <olaf@aepfle.de> > # Date 1322737507 -3600 > # Node ID 8147822efdee65d1f5b94656ab2032aedb76979f > # Parent 612f69531fd15cf59c58404f6e4762733a9a268c > xenpaging: use wait queues > > Use a wait queue to put a guest vcpu to sleep while the requested gfn is > in paging state. This adds missing p2m_mem_paging_populate() calls to > some callers of the new get_gfn* variants, which would crash now > because they get an invalid mfn. It also fixes guest crashes due to > unexpected returns from do_memory_op because copy_to/from_guest ran into > a paged gfn. Now those places will always get a valid mfn. > > Since each gfn could be requested by several guest vcpus at the same > time a queue of paged gfns is maintained. Each vcpu will be attached to > that queue. Once p2m_mem_paging_resume restored the gfn the waiting > vcpus will resume execution. > > There is untested code in p2m_mem_paging_init_queue() to allow cpu > hotplug. Since each vcpu may wait on a different gfn there have to be as > many queues as vcpus. But xl vcpu-set does not seem to work right now, > so this code path cant be excercised right now. > > > Signed-off-by: Olaf Hering <olaf@aepfle.de> > > diff -r 612f69531fd1 -r 8147822efdee xen/arch/x86/hvm/hvm.c > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -454,6 +454,8 @@ int hvm_domain_initialise(struct domain > spin_lock_init(&d->arch.hvm_domain.irq_lock); > spin_lock_init(&d->arch.hvm_domain.uc_lock); > > + spin_lock_init(&d->arch.hvm_domain.gfn_lock); > +I agree with Andres''s comments on this lock, and will go futher: please - add it to the mm-locks.h system so we see any ordering vioations in future patches. - add a detailed comment there explaining what exactly this lock protects. Cheers, Tim.
On Thu, Dec 15, Tim Deegan wrote:> Is there a more recent version of this patch? I see a newer version of > the wait-queues-for-mem-events one but not of this.The version I sent is not final, pv guests wont start for example. I will adress the comments from Andres and you and send out a better version. Olaf