Andres Lagar-Cavilla
2012-Jan-11 18:28 UTC
[PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
xen/arch/x86/hvm/hvm.c | 18 +- xen/arch/x86/mm/mem_event.c | 298 +++++++++++++++++++++++++++++++++------ xen/arch/x86/mm/mem_sharing.c | 30 +-- xen/arch/x86/mm/p2m.c | 81 +++++----- xen/common/memory.c | 7 +- xen/include/asm-x86/mem_event.h | 22 +- xen/include/asm-x86/p2m.h | 12 +- xen/include/xen/mm.h | 2 + xen/include/xen/sched.h | 22 ++- 9 files changed, 359 insertions(+), 133 deletions(-) This patch is an amalgamation of the work done by Olaf Hering <olaf@aepfle.de> and our work. It combines logic changes to simplify the memory event API, as well as leveraging wait queues to deal with extreme conditions in which too many events are generated by a guest vcpu. In order to generate a new event, a slot in the ring is claimed. If a guest vcpu is generating the event and there is no space, it is put on a wait queue. If a foreign vcpu is generating the event and there is no space, the vcpu is expected to retry its operation. If an error happens later, the function returns the claimed slot via a cancel operation. Thus, the API has only four calls: claim slot, cancel claimed slot, put request in the ring, consume the response. With all these mechanisms, no guest events are lost. Our testing includes 1. ballooning down 512 MiBs; 2. using mem access n2rwx, in which every page access in a four-vCPU guest results in an event, with no vCPU pausing, and the four vCPUs touching all RAM. No guest events were lost in either case, and qemu-dm had no mapping problems. For full disclosure, here is Olaf''s log of changes: v7: - add ring accounting so that each vcpu can place at least one request v6: - take foreign requests into account before calling wake_up_nr() - call wake_up_nr() outside of ring lock - rename ->bit to ->pause_flag - update comment in mem_event_enable() v5: - rename mem_event_check_ring() to mem_event_claim_slot() - rename mem_event_put_req_producers() to mem_event_release_slot() - add local/foreign request accounting - keep room for at least one guest request v4: - fix off-by-one bug in _mem_event_put_request - add mem_event_wake_requesters() and use wake_up_nr() - rename mem_event_mark_and_pause() and mem_event_mark_and_pause() functions - req_producers counts foreign request producers, rename member v3: - rename ->mem_event_bit to ->bit - remove me_ from new VPF_ defines v2: - p2m_mem_paging_populate: move vcpu_pause after put_request, otherwise the vcpu will not wake_up after a wait_event because the pause_count was increased twice. Fixes guest hangs. - update free space check in _mem_event_put_request() - simplify mem_event_put_request() Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r d3342b370b6f -r a85e7d46401b xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4188,25 +4188,31 @@ static int hvm_memory_event_traps(long p unsigned long value, unsigned long old, bool_t gla_valid, unsigned long gla) { + int rc; struct vcpu* v = current; struct domain *d = v->domain; mem_event_request_t req; - int rc; if ( !(p & HVMPME_MODE_MASK) ) return 0; if ( (p & HVMPME_onchangeonly) && (value == old) ) return 1; - - rc = mem_event_check_ring(d, &d->mem_event->access); - if ( rc ) + + rc = mem_event_claim_slot(d, &d->mem_event->access); + if ( rc == -ENOSYS ) + { + /* If there was no ring to handle the event, then + * simple continue executing normally. */ + return 1; + } + else if ( rc < 0 ) return rc; - + memset(&req, 0, sizeof(req)); req.type = MEM_EVENT_TYPE_ACCESS; req.reason = reason; - + if ( (p & HVMPME_MODE_MASK) == HVMPME_mode_sync ) { req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; diff -r d3342b370b6f -r a85e7d46401b xen/arch/x86/mm/mem_event.c --- a/xen/arch/x86/mm/mem_event.c +++ b/xen/arch/x86/mm/mem_event.c @@ -23,6 +23,7 @@ #include <asm/domain.h> #include <xen/event.h> +#include <xen/wait.h> #include <asm/p2m.h> #include <asm/mem_event.h> #include <asm/mem_paging.h> @@ -41,6 +42,7 @@ static int mem_event_enable( struct domain *d, xen_domctl_mem_event_op_t *mec, struct mem_event_domain *med, + int pause_flag, xen_event_channel_notification_t notification_fn) { int rc; @@ -93,6 +95,9 @@ static int mem_event_enable( put_gfn(dom_mem_event, ring_gfn); put_gfn(dom_mem_event, shared_gfn); + /* Set the number of currently blocked vCPUs to 0. */ + med->blocked = 0; + /* Allocate event channel */ rc = alloc_unbound_xen_event_channel(d->vcpu[0], current->domain->domain_id, @@ -110,8 +115,11 @@ static int mem_event_enable( mem_event_ring_lock_init(med); - /* Wake any VCPUs paused for memory events */ - mem_event_unpause_vcpus(d); + /* Save the pause flag for this particular ring. */ + med->pause_flag = pause_flag; + + /* Initialize the last-chance wait queue. */ + init_waitqueue_head(&med->wq); return 0; @@ -125,48 +133,218 @@ static int mem_event_enable( return rc; } -static int mem_event_disable(struct mem_event_domain *med) +static int mem_event_ring_available(struct mem_event_domain *med) { - unmap_domain_page(med->ring_page); - med->ring_page = NULL; + int avail_req = RING_FREE_REQUESTS(&med->front_ring); + avail_req -= med->target_producers; + avail_req -= med->foreign_producers; - unmap_domain_page(med->shared_page); - med->shared_page = NULL; + BUG_ON(avail_req < 0); + + return avail_req; +} + +/* + * mem_event_wake_requesters() will wakeup vcpus waiting for room in the + * ring. These vCPUs were paused on their way out after placing an event, + * but need to be resumed where the ring is capable of processing at least + * one event from them. + */ +static void mem_event_wake_blocked(struct domain *d, struct mem_event_domain *med) +{ + struct vcpu *v; + int online = d->max_vcpus; + int avail_req = mem_event_ring_available(med); + + if( avail_req <= 0 || med->blocked == 0 ) + return; + + /* + * We ensure that we only have vCPUs online if there are enough free slots + * for their memory events to be processed. This will ensure that no + * memory events are lost (due to the fact that certain types of events + * cannot be replayed, we need to ensure that there is space in the ring + * for when they are hit). + * See comment below in mem_event_put_request(). + */ + for_each_vcpu ( d, v ) + if ( test_bit(med->pause_flag, &v->pause_flags) ) + online--; + + ASSERT(online == (d->max_vcpus - med->blocked)); + + /* We remember which vcpu last woke up to avoid scanning always linearly + * from zero and starving higher-numbered vcpus under high load */ + if ( d->vcpu ) + { + int i, j, k; + + for (i = med->last_vcpu_wake_up + 1, j = 0; j < d->max_vcpus; i++, j++) + { + k = i % d->max_vcpus; + v = d->vcpu[k]; + if ( !v ) + continue; + + if ( !(med->blocked) || online >= avail_req ) + break; + + if ( test_and_clear_bit(med->pause_flag, &v->pause_flags) ) + { + vcpu_unpause(v); + online++; + med->blocked--; + med->last_vcpu_wake_up = k; + } + } + } +} + +/* + * In the event that a vCPU attempted to place an event in the ring and + * was unable to do so, it is queued on a wait queue. These are woken as + * needed, and take precedence over the blocked vCPUs. + */ +static void mem_event_wake_queued(struct domain *d, struct mem_event_domain *med) +{ + int avail_req = mem_event_ring_available(med); + + if ( avail_req > 0 ) + wake_up_nr(&med->wq, avail_req); +} + +/* + * mem_event_wake_waiters() will wakeup all vcpus waiting for the ring to + * become available. If we have queued vCPUs, they get top priority. We + * are guaranteed that they will go through code paths that will eventually + * call mem_event_wake() again, ensuring that any blocked vCPUs will get + * unpaused once all the queued vCPUs have made it through. + */ +void mem_event_wake(struct domain *d, struct mem_event_domain *med) +{ + if (!list_empty(&med->wq.list)) + mem_event_wake_queued(d, med); + else + mem_event_wake_blocked(d, med); +} + +static int mem_event_disable(struct domain *d, struct mem_event_domain *med) +{ + if( med->ring_page ) + { + struct vcpu *v; + + mem_event_ring_lock(med); + + if (!list_empty(&med->wq.list)) + { + mem_event_ring_unlock(med); + return -EBUSY; + } + + unmap_domain_page(med->ring_page); + med->ring_page = NULL; + + unmap_domain_page(med->shared_page); + med->shared_page = NULL; + + /* Wake up everybody */ + wake_up_all(&med->wq); + + /* Unblock all vCPUs */ + for_each_vcpu ( d, v ) + { + if ( test_and_clear_bit(med->pause_flag, &v->pause_flags) ) + { + vcpu_unpause(v); + med->blocked--; + } + } + + mem_event_ring_unlock(med); + } return 0; } -void mem_event_put_request(struct domain *d, struct mem_event_domain *med, mem_event_request_t *req) +static inline void mem_event_release_slot(struct domain *d, + struct mem_event_domain *med) +{ + /* Update the accounting */ + if ( current->domain == d ) + med->target_producers--; + else + med->foreign_producers--; + + /* Kick any waiters */ + mem_event_wake(d, med); +} + +/* + * mem_event_mark_and_pause() tags vcpu and put it to sleep. + * The vcpu will resume execution in mem_event_wake_waiters(). + */ +void mem_event_mark_and_pause(struct vcpu *v, struct mem_event_domain *med) +{ + if ( !test_and_set_bit(med->pause_flag, &v->pause_flags) ) + { + vcpu_pause_nosync(v); + med->blocked++; + } +} + +/* + * This must be preceeded by a call to claim_slot(), and is guaranteed to + * succeed. As a side-effect however, the vCPU may be paused if the ring is + * overly full and its continued execution would cause stalling and excessive + * waiting. The vCPU will be automatically unpaused when the ring clears. + */ +void mem_event_put_request(struct domain *d, + struct mem_event_domain *med, + mem_event_request_t *req) { mem_event_front_ring_t *front_ring; + int free_req, avail_req; RING_IDX req_prod; - mem_event_ring_lock(med); - - front_ring = &med->front_ring; - req_prod = front_ring->req_prod_pvt; - if ( current->domain != d ) { req->flags |= MEM_EVENT_FLAG_FOREIGN; ASSERT( !(req->flags & MEM_EVENT_FLAG_VCPU_PAUSED) ); } + mem_event_ring_lock(med); + + /* Due to the reservations, this step must succeed. */ + front_ring = &med->front_ring; + free_req = RING_FREE_REQUESTS(front_ring); + ASSERT(free_req > 0); + /* Copy request */ + req_prod = front_ring->req_prod_pvt; memcpy(RING_GET_REQUEST(front_ring, req_prod), req, sizeof(*req)); req_prod++; /* Update ring */ - med->req_producers--; front_ring->req_prod_pvt = req_prod; RING_PUSH_REQUESTS(front_ring); + /* We''ve actually *used* our reservation, so release the slot. */ + mem_event_release_slot(d, med); + + /* Give this vCPU a black eye if necessary, on the way out. + * See the comments above wake_blocked() for more information + * on how this mechanism works to avoid waiting. */ + avail_req = mem_event_ring_available(med); + if( current->domain == d && avail_req < d->max_vcpus ) + mem_event_mark_and_pause(current, med); + mem_event_ring_unlock(med); notify_via_xen_event_channel(d, med->xen_port); } -int mem_event_get_response(struct mem_event_domain *med, mem_event_response_t *rsp) +int mem_event_get_response(struct domain *d, struct mem_event_domain *med, mem_event_response_t *rsp) { mem_event_front_ring_t *front_ring; RING_IDX rsp_cons; @@ -190,57 +368,81 @@ int mem_event_get_response(struct mem_ev front_ring->rsp_cons = rsp_cons; front_ring->sring->rsp_event = rsp_cons + 1; + /* Kick any waiters -- since we''ve just consumed an event, + * there may be additional space available in the ring. */ + mem_event_wake(d, med); + mem_event_ring_unlock(med); return 1; } -void mem_event_unpause_vcpus(struct domain *d) -{ - struct vcpu *v; - - for_each_vcpu ( d, v ) - if ( test_and_clear_bit(_VPF_mem_event, &v->pause_flags) ) - vcpu_wake(v); -} - -void mem_event_mark_and_pause(struct vcpu *v) -{ - set_bit(_VPF_mem_event, &v->pause_flags); - vcpu_sleep_nosync(v); -} - -void mem_event_put_req_producers(struct mem_event_domain *med) +void mem_event_cancel_slot(struct domain *d, struct mem_event_domain *med) { mem_event_ring_lock(med); - med->req_producers--; + mem_event_release_slot(d, med); mem_event_ring_unlock(med); } -int mem_event_check_ring(struct domain *d, struct mem_event_domain *med) +static int mem_event_grab_slot(struct mem_event_domain *med, int foreign) { - struct vcpu *curr = current; - int free_requests; - int ring_full = 1; + int avail_req; if ( !med->ring_page ) - return -1; + return -ENOSYS; mem_event_ring_lock(med); - free_requests = RING_FREE_REQUESTS(&med->front_ring); - if ( med->req_producers < free_requests ) + avail_req = mem_event_ring_available(med); + if( avail_req <= 0 ) { - med->req_producers++; - ring_full = 0; + mem_event_ring_unlock(med); + return -EBUSY; } - if ( ring_full && (curr->domain == d) ) - mem_event_mark_and_pause(curr); + if ( !foreign ) + med->target_producers++; + else + med->foreign_producers++; mem_event_ring_unlock(med); - return ring_full; + return 0; +} + +/* Simple try_grab wrapper for use in the wait_event() macro. */ +static int mem_event_wait_try_grab(struct mem_event_domain *med, int *rc) +{ + *rc = mem_event_grab_slot(med, 0); + return *rc; +} + +/* Call mem_event_grab_slot() until the ring doesn''t exist, or is available. */ +static int mem_event_wait_slot(struct mem_event_domain *med) +{ + int rc = -EBUSY; + wait_event(med->wq, mem_event_wait_try_grab(med, &rc) != -EBUSY); + return rc; +} + +/* + * Determines whether or not the current vCPU belongs to the target domain, + * and calls the appropriate wait function. If it is a guest vCPU, then we + * use mem_event_wait_slot() to reserve a slot. As long as there is a ring, + * this function will always return 0 for a guest. For a non-guest, we check + * for space and return -EBUSY if the ring is not available. + * + * Return codes: -ENOSYS: the ring is not yet configured + * -EBUSY: the ring is busy + * 0: a spot has been reserved + * + */ +int mem_event_claim_slot(struct domain *d, struct mem_event_domain *med) +{ + if ( current->domain == d ) + return mem_event_wait_slot(med); + else + return mem_event_grab_slot(med, 1); } /* Registered with Xen-bound event channel for incoming notifications. */ @@ -316,14 +518,14 @@ int mem_event_domctl(struct domain *d, x if ( p2m->pod.entry_count ) break; - rc = mem_event_enable(d, mec, med, mem_paging_notification); + rc = mem_event_enable(d, mec, med, _VPF_mem_paging, mem_paging_notification); } break; case XEN_DOMCTL_MEM_EVENT_OP_PAGING_DISABLE: { if ( med->ring_page ) - rc = mem_event_disable(med); + rc = mem_event_disable(d, med); } break; @@ -355,14 +557,14 @@ int mem_event_domctl(struct domain *d, x if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) break; - rc = mem_event_enable(d, mec, med, mem_access_notification); + rc = mem_event_enable(d, mec, med, _VPF_mem_access, mem_access_notification); } break; case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_DISABLE: { if ( med->ring_page ) - rc = mem_event_disable(med); + rc = mem_event_disable(d, med); } break; diff -r d3342b370b6f -r a85e7d46401b xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -253,18 +253,10 @@ static void mem_sharing_audit(void) #endif -static struct page_info* mem_sharing_alloc_page(struct domain *d, - unsigned long gfn) +static void mem_sharing_notify_helper(struct domain *d, unsigned long gfn) { - struct page_info* page; struct vcpu *v = current; - mem_event_request_t req; - - page = alloc_domheap_page(d, 0); - if(page != NULL) return page; - - memset(&req, 0, sizeof(req)); - req.type = MEM_EVENT_TYPE_SHARED; + mem_event_request_t req = { .type = MEM_EVENT_TYPE_SHARED }; if ( v->domain != d ) { @@ -275,20 +267,21 @@ static struct page_info* mem_sharing_all gdprintk(XENLOG_ERR, "Failed alloc on unshare path for foreign (%d) lookup\n", d->domain_id); - return page; + return; } + if (mem_event_claim_slot(d, &d->mem_event->share) < 0) + { + return; + } + + req.flags = MEM_EVENT_FLAG_VCPU_PAUSED; vcpu_pause_nosync(v); - req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; - - if(mem_event_check_ring(d, &d->mem_event->share)) return page; req.gfn = gfn; req.p2mt = p2m_ram_shared; req.vcpu_id = v->vcpu_id; mem_event_put_request(d, &d->mem_event->share, &req); - - return page; } unsigned int mem_sharing_get_nr_saved_mfns(void) @@ -301,7 +294,7 @@ int mem_sharing_sharing_resume(struct do mem_event_response_t rsp; /* Get all requests off the ring */ - while ( mem_event_get_response(&d->mem_event->share, &rsp) ) + while ( mem_event_get_response(d, &d->mem_event->share, &rsp) ) { if ( rsp.flags & MEM_EVENT_FLAG_DUMMY ) continue; @@ -658,13 +651,14 @@ gfn_found: if(ret == 0) goto private_page_found; old_page = page; - page = mem_sharing_alloc_page(d, gfn); + page = alloc_domheap_page(d, 0); if(!page) { /* We''ve failed to obtain memory for private page. Need to re-add the * gfn_info to relevant list */ list_add(&gfn_info->list, &hash_entry->gfns); put_gfn(d, gfn); + mem_sharing_notify_helper(d, gfn); shr_unlock(); return -ENOMEM; } diff -r d3342b370b6f -r a85e7d46401b xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -859,22 +859,26 @@ int p2m_mem_paging_evict(struct domain * * released by the guest. The pager is supposed to drop its reference of the * gfn. */ -void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn) +int p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn) { - struct vcpu *v = current; mem_event_request_t req; - /* Check that there''s space on the ring for this request */ - if ( mem_event_check_ring(d, &d->mem_event->paging) == 0) - { - /* Send release notification to pager */ - memset(&req, 0, sizeof(req)); - req.flags |= MEM_EVENT_FLAG_DROP_PAGE; - req.gfn = gfn; - req.vcpu_id = v->vcpu_id; + /* We allow no ring in this unique case, because it won''t affect + * correctness of the guest execution at this point. If this is the only + * page that happens to be paged-out, we''ll be okay.. but it''s likely the + * guest will crash shortly anyways. */ + int rc = mem_event_claim_slot(d, &d->mem_event->paging); + if ( rc < 0 ) + return rc; - mem_event_put_request(d, &d->mem_event->paging, &req); - } + /* Send release notification to pager */ + memset(&req, 0, sizeof(req)); + req.type = MEM_EVENT_TYPE_PAGING; + req.gfn = gfn; + req.flags = MEM_EVENT_FLAG_DROP_PAGE; + + mem_event_put_request(d, &d->mem_event->paging, &req); + return 0; } /** @@ -898,7 +902,7 @@ void p2m_mem_paging_drop_page(struct dom * already sent to the pager. In this case the caller has to try again until the * gfn is fully paged in again. */ -void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) +int p2m_mem_paging_populate(struct domain *d, unsigned long gfn) { struct vcpu *v = current; mem_event_request_t req; @@ -907,9 +911,17 @@ void p2m_mem_paging_populate(struct doma mfn_t mfn; struct p2m_domain *p2m = p2m_get_hostp2m(d); - /* Check that there''s space on the ring for this request */ - if ( mem_event_check_ring(d, &d->mem_event->paging) ) - return; + /* We''re paging. There should be a ring */ + int rc = mem_event_claim_slot(d, &d->mem_event->paging); + if ( rc == -ENOSYS ) + { + gdprintk(XENLOG_ERR, "Domain %hu paging gfn %lx yet no ring " + "in place\n", d->domain_id, gfn); + domain_crash(d); + return 0; + } + else if ( rc < 0 ) + return rc; memset(&req, 0, sizeof(req)); req.type = MEM_EVENT_TYPE_PAGING; @@ -929,7 +941,7 @@ void p2m_mem_paging_populate(struct doma p2m_unlock(p2m); /* Pause domain if request came from guest and gfn has paging type */ - if ( p2m_is_paging(p2mt) && v->domain == d ) + if ( p2m_is_paging(p2mt) && v->domain == d ) { vcpu_pause_nosync(v); req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; @@ -938,8 +950,8 @@ void p2m_mem_paging_populate(struct doma else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged ) { /* gfn is already on its way back and vcpu is not paused */ - mem_event_put_req_producers(&d->mem_event->paging); - return; + mem_event_cancel_slot(d, &d->mem_event->paging); + return 0; } /* Send request to pager */ @@ -948,6 +960,7 @@ void p2m_mem_paging_populate(struct doma req.vcpu_id = v->vcpu_id; mem_event_put_request(d, &d->mem_event->paging, &req); + return 0; } /** @@ -1065,7 +1078,7 @@ void p2m_mem_paging_resume(struct domain mfn_t mfn; /* Pull all responses off the ring */ - while( mem_event_get_response(&d->mem_event->paging, &rsp) ) + while( mem_event_get_response(d, &d->mem_event->paging, &rsp) ) { if ( rsp.flags & MEM_EVENT_FLAG_DUMMY ) continue; @@ -1090,9 +1103,6 @@ void p2m_mem_paging_resume(struct domain if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) vcpu_unpause(d->vcpu[rsp.vcpu_id]); } - - /* Unpause any domains that were paused because the ring was full */ - mem_event_unpause_vcpus(d); } bool_t p2m_mem_access_check(unsigned long gpa, bool_t gla_valid, unsigned long gla, @@ -1103,7 +1113,6 @@ bool_t p2m_mem_access_check(unsigned lon unsigned long gfn = gpa >> PAGE_SHIFT; struct domain *d = v->domain; struct p2m_domain* p2m = p2m_get_hostp2m(d); - int res; mfn_t mfn; p2m_type_t p2mt; p2m_access_t p2ma; @@ -1126,17 +1135,16 @@ bool_t p2m_mem_access_check(unsigned lon p2m_unlock(p2m); /* Otherwise, check if there is a memory event listener, and send the message along */ - res = mem_event_check_ring(d, &d->mem_event->access); - if ( res < 0 ) + if ( mem_event_claim_slot(d, &d->mem_event->access) == -ENOSYS ) { /* No listener */ if ( p2m->access_required ) { - printk(XENLOG_INFO - "Memory access permissions failure, no mem_event listener: pausing VCPU %d, dom %d\n", - v->vcpu_id, d->domain_id); - - mem_event_mark_and_pause(v); + gdprintk(XENLOG_INFO, "Memory access permissions failure, " + "no mem_event listener VCPU %d, dom %d\n", + v->vcpu_id, d->domain_id); + domain_crash(v->domain); + return 0; } else { @@ -1149,11 +1157,7 @@ bool_t p2m_mem_access_check(unsigned lon } return 1; } - - return 0; } - else if ( res > 0 ) - return 0; /* No space in buffer; VCPU paused */ memset(&req, 0, sizeof(req)); req.type = MEM_EVENT_TYPE_ACCESS; @@ -1188,7 +1192,7 @@ void p2m_mem_access_resume(struct domain mem_event_response_t rsp; /* Pull all responses off the ring */ - while( mem_event_get_response(&d->mem_event->access, &rsp) ) + while( mem_event_get_response(d, &d->mem_event->access, &rsp) ) { if ( rsp.flags & MEM_EVENT_FLAG_DUMMY ) continue; @@ -1196,13 +1200,8 @@ void p2m_mem_access_resume(struct domain if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) vcpu_unpause(d->vcpu[rsp.vcpu_id]); } - - /* Unpause any domains that were paused because the ring was full or no listener - * was available */ - mem_event_unpause_vcpus(d); } - /* Set access type for a region of pfns. * If start_pfn == -1ul, sets the default access type */ int p2m_set_mem_access(struct domain *d, unsigned long start_pfn, diff -r d3342b370b6f -r a85e7d46401b xen/common/memory.c --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -165,10 +165,13 @@ int guest_remove_page(struct domain *d, mfn = mfn_x(get_gfn(d, gmfn, &p2mt)); if ( unlikely(p2m_is_paging(p2mt)) ) { + int rc; guest_physmap_remove_page(d, gmfn, mfn, 0); - p2m_mem_paging_drop_page(d, gmfn); + rc = p2m_mem_paging_drop_page(d, gmfn); put_gfn(d, gmfn); - return 1; + /* drop_page returns zero on success, we return 1 on success. + * drop_page returns negative on error, never returns 1. */ + return rc ? rc : 1; } #else mfn = gmfn_to_mfn(d, gmfn); diff -r d3342b370b6f -r a85e7d46401b xen/include/asm-x86/mem_event.h --- a/xen/include/asm-x86/mem_event.h +++ b/xen/include/asm-x86/mem_event.h @@ -24,18 +24,24 @@ #ifndef __MEM_EVENT_H__ #define __MEM_EVENT_H__ -/* Pauses VCPU while marking pause flag for mem event */ -void mem_event_mark_and_pause(struct vcpu *v); -int mem_event_check_ring(struct domain *d, struct mem_event_domain *med); -void mem_event_put_req_producers(struct mem_event_domain *med); -void mem_event_put_request(struct domain *d, struct mem_event_domain *med, mem_event_request_t *req); -int mem_event_get_response(struct mem_event_domain *med, mem_event_response_t *rsp); -void mem_event_unpause_vcpus(struct domain *d); +/* Returns 0 on success, -ENOSYS if there is no ring, -EBUSY if there is no + * available space. For success or -EBUSY, the vCPU may be left blocked + * temporarily to ensure that the ring does not lose future events. In + * general, you must follow a claim_slot() call with either put_request() or + * cancel_slot(), both of which are guaranteed to succeed. */ +int mem_event_claim_slot(struct domain *d, struct mem_event_domain *med); + +void mem_event_cancel_slot(struct domain *d, struct mem_event_domain *med); + +void mem_event_put_request(struct domain *d, struct mem_event_domain *med, + mem_event_request_t *req); + +int mem_event_get_response(struct domain *d, struct mem_event_domain *med, + mem_event_response_t *rsp); int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec, XEN_GUEST_HANDLE(void) u_domctl); - #endif /* __MEM_EVENT_H__ */ diff -r d3342b370b6f -r a85e7d46401b xen/include/asm-x86/p2m.h --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -478,18 +478,18 @@ int p2m_mem_paging_nominate(struct domai /* Evict a frame */ int p2m_mem_paging_evict(struct domain *d, unsigned long gfn); /* Tell xenpaging to drop a paged out frame */ -void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn); +int p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn); /* Start populating a paged out frame */ -void p2m_mem_paging_populate(struct domain *d, unsigned long gfn); +int p2m_mem_paging_populate(struct domain *d, unsigned long gfn); /* Prepare the p2m for paging a frame in */ int p2m_mem_paging_prep(struct domain *d, unsigned long gfn, uint64_t buffer); /* Resume normal operation (in case a domain was paused) */ void p2m_mem_paging_resume(struct domain *d); #else -static inline void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn) -{ } -static inline void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) -{ } +static inline int p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn) +{ return 0; } +static inline int p2m_mem_paging_populate(struct domain *d, unsigned long gfn) +{ return 0; } #endif #ifdef __x86_64__ diff -r d3342b370b6f -r a85e7d46401b xen/include/xen/mm.h --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -318,6 +318,8 @@ page_list_splice(struct page_list_head * void scrub_one_page(struct page_info *); +/* Returns 1 on success, 0 on error, negative if the ring + * for event propagation is full in the presence of paging */ int guest_remove_page(struct domain *d, unsigned long gmfn); #define RAM_TYPE_CONVENTIONAL 0x00000001 diff -r d3342b370b6f -r a85e7d46401b xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -14,6 +14,7 @@ #include <xen/nodemask.h> #include <xen/radix-tree.h> #include <xen/multicall.h> +#include <xen/wait.h> #include <public/xen.h> #include <public/domctl.h> #include <public/sysctl.h> @@ -183,7 +184,9 @@ struct mem_event_domain { /* ring lock */ spinlock_t ring_lock; - unsigned int req_producers; + /* The ring has 64 entries */ + unsigned char foreign_producers; + unsigned char target_producers; /* shared page */ mem_event_shared_page_t *shared_page; /* shared ring page */ @@ -192,6 +195,14 @@ struct mem_event_domain mem_event_front_ring_t front_ring; /* event channel port (vcpu0 only) */ int xen_port; + /* mem_event bit for vcpu->pause_flags */ + int pause_flag; + /* list of vcpus waiting for room in the ring */ + struct waitqueue_head wq; + /* the number of vCPUs blocked */ + unsigned int blocked; + /* The last vcpu woken up */ + unsigned int last_vcpu_wake_up; }; struct mem_event_per_domain @@ -615,9 +626,12 @@ static inline struct domain *next_domain /* VCPU affinity has changed: migrating to a new CPU. */ #define _VPF_migrating 3 #define VPF_migrating (1UL<<_VPF_migrating) - /* VCPU is blocked on memory-event ring. */ -#define _VPF_mem_event 4 -#define VPF_mem_event (1UL<<_VPF_mem_event) + /* VCPU is blocked due to missing mem_paging ring. */ +#define _VPF_mem_paging 4 +#define VPF_mem_paging (1UL<<_VPF_mem_paging) + /* VCPU is blocked due to missing mem_access ring. */ +#define _VPF_mem_access 5 +#define VPF_mem_access (1UL<<_VPF_mem_access) static inline int vcpu_runnable(struct vcpu *v) {
Tim Deegan
2012-Jan-12 11:56 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
Hi, This looks fine to me; this time I''d like Olaf to review and test it before I commit it. :) A few comments inline below -- mostly nits about style. At 13:28 -0500 on 11 Jan (1326288504), Andres Lagar-Cavilla wrote:> Signed-off-by: Adin Scannell <adin@scannell.ca> > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>Housekeeping: please keep Olaf''s Signed-off-by: line to cover the parts of this patch that he''s certifying.> diff -r d3342b370b6f -r a85e7d46401b xen/arch/x86/hvm/hvm.c > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -4188,25 +4188,31 @@ static int hvm_memory_event_traps(long p > unsigned long value, unsigned long old, > bool_t gla_valid, unsigned long gla) > { > + int rc; > struct vcpu* v = current; > struct domain *d = v->domain; > mem_event_request_t req; > - int rc; >Please try to avoid this kind of unrelated shuffling (and I saw some whitespace changes later on as well). It''s not a big deal, but it makes reviewing a bit easier and avoids unnecessarily clashing with other patches.> diff -r d3342b370b6f -r a85e7d46401b xen/arch/x86/mm/mem_event.c > --- a/xen/arch/x86/mm/mem_event.c > +++ b/xen/arch/x86/mm/mem_event.c> + > +/* > + * mem_event_wake_requesters() will wakeup vcpus waiting for room in theDYM mem_event_wake_blocked() ?> + * ring. These vCPUs were paused on their way out after placing an event, > + * but need to be resumed where the ring is capable of processing at least > + * one event from them. > + */ > +static void mem_event_wake_blocked(struct domain *d, struct mem_event_domain *med) > +{ > + struct vcpu *v; > + int online = d->max_vcpus; > + int avail_req = mem_event_ring_available(med); > + > + if( avail_req <= 0 || med->blocked == 0 )s/if(/if (/> + return; > + > + /* > + * We ensure that we only have vCPUs online if there are enough free slots > + * for their memory events to be processed. This will ensure that no > + * memory events are lost (due to the fact that certain types of events > + * cannot be replayed, we need to ensure that there is space in the ring > + * for when they are hit). > + * See comment below in mem_event_put_request(). > + */ > + for_each_vcpu ( d, v ) > + if ( test_bit(med->pause_flag, &v->pause_flags) ) > + online--; > + > + ASSERT(online == (d->max_vcpus - med->blocked));Maybe better to do the cheap calculation as the default and the more expensive one for the ASSERT?> + /* We remember which vcpu last woke up to avoid scanning always linearly > + * from zero and starving higher-numbered vcpus under high load */ > + if ( d->vcpu ) > + { > + int i, j, k; > + > + for (i = med->last_vcpu_wake_up + 1, j = 0; j < d->max_vcpus; i++, j++) > + { > + k = i % d->max_vcpus; > + v = d->vcpu[k]; > + if ( !v ) > + continue; > + > + if ( !(med->blocked) || online >= avail_req ) > + break; > + > + if ( test_and_clear_bit(med->pause_flag, &v->pause_flags) ) > + { > + vcpu_unpause(v); > + online++; > + med->blocked--; > + med->last_vcpu_wake_up = k; > + } > + } > + } > +} > + > +/* > + * In the event that a vCPU attempted to place an event in the ring and > + * was unable to do so, it is queued on a wait queue. These are woken as > + * needed, and take precedence over the blocked vCPUs. > + */ > +static void mem_event_wake_queued(struct domain *d, struct mem_event_domain *med) > +{ > + int avail_req = mem_event_ring_available(med); > + > + if ( avail_req > 0 ) > + wake_up_nr(&med->wq, avail_req); > +} > + > +/* > + * mem_event_wake_waiters() will wakeup all vcpus waiting for the ring toDYM mem_event_wake() ?> + * become available. If we have queued vCPUs, they get top priority. We > + * are guaranteed that they will go through code paths that will eventually > + * call mem_event_wake() again, ensuring that any blocked vCPUs will get > + * unpaused once all the queued vCPUs have made it through. > + */ > +void mem_event_wake(struct domain *d, struct mem_event_domain *med) > +{ > + if (!list_empty(&med->wq.list)) > + mem_event_wake_queued(d, med); > + else > + mem_event_wake_blocked(d, med); > +} > + > +static int mem_event_disable(struct domain *d, struct mem_event_domain *med) > +{ > + if( med->ring_page )s/if(/if (/g :)> + { > + struct vcpu *v; > + > + mem_event_ring_lock(med); > + > + if (!list_empty(&med->wq.list))if ( ... )> + { > + mem_event_ring_unlock(med); > + return -EBUSY; > + } > + > + unmap_domain_page(med->ring_page); > + med->ring_page = NULL; > + > + unmap_domain_page(med->shared_page); > + med->shared_page = NULL; > + > + /* Wake up everybody */ > + wake_up_all(&med->wq); > + > + /* Unblock all vCPUs */ > + for_each_vcpu ( d, v ) > + { > + if ( test_and_clear_bit(med->pause_flag, &v->pause_flags) ) > + { > + vcpu_unpause(v); > + med->blocked--; > + } > + } > + > + mem_event_ring_unlock(med); > + } > > return 0; > } > > -void mem_event_put_request(struct domain *d, struct mem_event_domain *med, mem_event_request_t *req) > +static inline void mem_event_release_slot(struct domain *d, > + struct mem_event_domain *med) > +{ > + /* Update the accounting */ > + if ( current->domain == d ) > + med->target_producers--; > + else > + med->foreign_producers--; > + > + /* Kick any waiters */ > + mem_event_wake(d, med); > +} > + > +/* > + * mem_event_mark_and_pause() tags vcpu and put it to sleep. > + * The vcpu will resume execution in mem_event_wake_waiters(). > + */ > +void mem_event_mark_and_pause(struct vcpu *v, struct mem_event_domain *med) > +{ > + if ( !test_and_set_bit(med->pause_flag, &v->pause_flags) ) > + { > + vcpu_pause_nosync(v); > + med->blocked++; > + } > +} > + > +/* > + * This must be preceeded by a call to claim_slot(), and is guaranteed topreceded Cheers, Tim.
Andres Lagar-Cavilla
2012-Jan-12 16:05 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
> Hi, > > This looks fine to me; this time I''d like Olaf to review and test it > before I commit it. :) > > A few comments inline below -- mostly nits about style.Great, will address them -- once Olaf gives us a green light. Andres> > At 13:28 -0500 on 11 Jan (1326288504), Andres Lagar-Cavilla wrote: >> Signed-off-by: Adin Scannell <adin@scannell.ca> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > Housekeeping: please keep Olaf''s Signed-off-by: line to cover the parts > of this patch that he''s certifying. > >> diff -r d3342b370b6f -r a85e7d46401b xen/arch/x86/hvm/hvm.c >> --- a/xen/arch/x86/hvm/hvm.c >> +++ b/xen/arch/x86/hvm/hvm.c >> @@ -4188,25 +4188,31 @@ static int hvm_memory_event_traps(long p >> unsigned long value, unsigned long >> old, >> bool_t gla_valid, unsigned long gla) >> { >> + int rc; >> struct vcpu* v = current; >> struct domain *d = v->domain; >> mem_event_request_t req; >> - int rc; >> > > Please try to avoid this kind of unrelated shuffling (and I saw some > whitespace changes later on as well). It''s not a big deal, but it makes > reviewing a bit easier and avoids unnecessarily clashing with other > patches. > >> diff -r d3342b370b6f -r a85e7d46401b xen/arch/x86/mm/mem_event.c >> --- a/xen/arch/x86/mm/mem_event.c >> +++ b/xen/arch/x86/mm/mem_event.c > >> + >> +/* >> + * mem_event_wake_requesters() will wakeup vcpus waiting for room in >> the > > DYM mem_event_wake_blocked() ? > >> + * ring. These vCPUs were paused on their way out after placing an >> event, >> + * but need to be resumed where the ring is capable of processing at >> least >> + * one event from them. >> + */ >> +static void mem_event_wake_blocked(struct domain *d, struct >> mem_event_domain *med) >> +{ >> + struct vcpu *v; >> + int online = d->max_vcpus; >> + int avail_req = mem_event_ring_available(med); >> + >> + if( avail_req <= 0 || med->blocked == 0 ) > > s/if(/if (/ > >> + return; >> + >> + /* >> + * We ensure that we only have vCPUs online if there are enough >> free slots >> + * for their memory events to be processed. This will ensure that >> no >> + * memory events are lost (due to the fact that certain types of >> events >> + * cannot be replayed, we need to ensure that there is space in the >> ring >> + * for when they are hit). >> + * See comment below in mem_event_put_request(). >> + */ >> + for_each_vcpu ( d, v ) >> + if ( test_bit(med->pause_flag, &v->pause_flags) ) >> + online--; >> + >> + ASSERT(online == (d->max_vcpus - med->blocked)); > > Maybe better to do the cheap calculation as the default and the more > expensive one for the ASSERT? > >> + /* We remember which vcpu last woke up to avoid scanning always >> linearly >> + * from zero and starving higher-numbered vcpus under high load */ >> + if ( d->vcpu ) >> + { >> + int i, j, k; >> + >> + for (i = med->last_vcpu_wake_up + 1, j = 0; j < d->max_vcpus; >> i++, j++) >> + { >> + k = i % d->max_vcpus; >> + v = d->vcpu[k]; >> + if ( !v ) >> + continue; >> + >> + if ( !(med->blocked) || online >= avail_req ) >> + break; >> + >> + if ( test_and_clear_bit(med->pause_flag, &v->pause_flags) ) >> + { >> + vcpu_unpause(v); >> + online++; >> + med->blocked--; >> + med->last_vcpu_wake_up = k; >> + } >> + } >> + } >> +} >> + >> +/* >> + * In the event that a vCPU attempted to place an event in the ring and >> + * was unable to do so, it is queued on a wait queue. These are woken >> as >> + * needed, and take precedence over the blocked vCPUs. >> + */ >> +static void mem_event_wake_queued(struct domain *d, struct >> mem_event_domain *med) >> +{ >> + int avail_req = mem_event_ring_available(med); >> + >> + if ( avail_req > 0 ) >> + wake_up_nr(&med->wq, avail_req); >> +} >> + >> +/* >> + * mem_event_wake_waiters() will wakeup all vcpus waiting for the ring >> to > > DYM mem_event_wake() ? > >> + * become available. If we have queued vCPUs, they get top priority. >> We >> + * are guaranteed that they will go through code paths that will >> eventually >> + * call mem_event_wake() again, ensuring that any blocked vCPUs will >> get >> + * unpaused once all the queued vCPUs have made it through. >> + */ >> +void mem_event_wake(struct domain *d, struct mem_event_domain *med) >> +{ >> + if (!list_empty(&med->wq.list)) >> + mem_event_wake_queued(d, med); >> + else >> + mem_event_wake_blocked(d, med); >> +} >> + >> +static int mem_event_disable(struct domain *d, struct mem_event_domain >> *med) >> +{ >> + if( med->ring_page ) > > s/if(/if (/g :) > >> + { >> + struct vcpu *v; >> + >> + mem_event_ring_lock(med); >> + >> + if (!list_empty(&med->wq.list)) > > if ( ... ) > >> + { >> + mem_event_ring_unlock(med); >> + return -EBUSY; >> + } >> + >> + unmap_domain_page(med->ring_page); >> + med->ring_page = NULL; >> + >> + unmap_domain_page(med->shared_page); >> + med->shared_page = NULL; >> + >> + /* Wake up everybody */ >> + wake_up_all(&med->wq); >> + >> + /* Unblock all vCPUs */ >> + for_each_vcpu ( d, v ) >> + { >> + if ( test_and_clear_bit(med->pause_flag, &v->pause_flags) ) >> + { >> + vcpu_unpause(v); >> + med->blocked--; >> + } >> + } >> + >> + mem_event_ring_unlock(med); >> + } >> >> return 0; >> } >> >> -void mem_event_put_request(struct domain *d, struct mem_event_domain >> *med, mem_event_request_t *req) >> +static inline void mem_event_release_slot(struct domain *d, >> + struct mem_event_domain *med) >> +{ >> + /* Update the accounting */ >> + if ( current->domain == d ) >> + med->target_producers--; >> + else >> + med->foreign_producers--; >> + >> + /* Kick any waiters */ >> + mem_event_wake(d, med); >> +} >> + >> +/* >> + * mem_event_mark_and_pause() tags vcpu and put it to sleep. >> + * The vcpu will resume execution in mem_event_wake_waiters(). >> + */ >> +void mem_event_mark_and_pause(struct vcpu *v, struct mem_event_domain >> *med) >> +{ >> + if ( !test_and_set_bit(med->pause_flag, &v->pause_flags) ) >> + { >> + vcpu_pause_nosync(v); >> + med->blocked++; >> + } >> +} >> + >> +/* >> + * This must be preceeded by a call to claim_slot(), and is guaranteed >> to > > preceded > > Cheers, > > Tim. >
Olaf Hering
2012-Jan-13 09:50 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
On Wed, Jan 11, Andres Lagar-Cavilla wrote: A few comments:> -static int mem_event_disable(struct mem_event_domain *med) > +static int mem_event_ring_available(struct mem_event_domain *med) > { > - unmap_domain_page(med->ring_page); > - med->ring_page = NULL; > + int avail_req = RING_FREE_REQUESTS(&med->front_ring); > + avail_req -= med->target_producers; > + avail_req -= med->foreign_producers; > > - unmap_domain_page(med->shared_page); > - med->shared_page = NULL; > + BUG_ON(avail_req < 0); > + > + return avail_req; > +} > +mem_event_ring_available() should return unsigned since the values it provides can only be positive. The function itself enforces this.> -void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn) > +int p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn) > { > - struct vcpu *v = current; > mem_event_request_t req; > > - /* Check that there''s space on the ring for this request */ > - if ( mem_event_check_ring(d, &d->mem_event->paging) == 0) > - { > - /* Send release notification to pager */ > - memset(&req, 0, sizeof(req)); > - req.flags |= MEM_EVENT_FLAG_DROP_PAGE; > - req.gfn = gfn; > - req.vcpu_id = v->vcpu_id; > + /* We allow no ring in this unique case, because it won''t affect > + * correctness of the guest execution at this point. If this is the only > + * page that happens to be paged-out, we''ll be okay.. but it''s likely the > + * guest will crash shortly anyways. */ > + int rc = mem_event_claim_slot(d, &d->mem_event->paging); > + if ( rc < 0 ) > + return rc; > > - mem_event_put_request(d, &d->mem_event->paging, &req); > - } > + /* Send release notification to pager */ > + memset(&req, 0, sizeof(req)); > + req.type = MEM_EVENT_TYPE_PAGING; > + req.gfn = gfn; > + req.flags = MEM_EVENT_FLAG_DROP_PAGE; > + > + mem_event_put_request(d, &d->mem_event->paging, &req); > + return 0; > }p2m_mem_paging_drop_page() should remain void because the caller has already done its work, making it not restartable. Also it is only called when a gfn is in paging state, which I''m sure can not happen without a ring. And quilt says: Warning: trailing whitespace in lines 167,254 of xen/arch/x86/mm/mem_event.c Warning: trailing whitespace in line 168 of xen/common/memory.c Warning: trailing whitespace in line 1127 of xen/arch/x86/mm/p2m.c Olaf
Andres Lagar-Cavilla
2012-Jan-13 15:14 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
> On Wed, Jan 11, Andres Lagar-Cavilla wrote: > > A few comments: > >> -static int mem_event_disable(struct mem_event_domain *med) >> +static int mem_event_ring_available(struct mem_event_domain *med) >> { >> - unmap_domain_page(med->ring_page); >> - med->ring_page = NULL; >> + int avail_req = RING_FREE_REQUESTS(&med->front_ring); >> + avail_req -= med->target_producers; >> + avail_req -= med->foreign_producers; >> >> - unmap_domain_page(med->shared_page); >> - med->shared_page = NULL; >> + BUG_ON(avail_req < 0); >> + >> + return avail_req; >> +} >> + > > mem_event_ring_available() should return unsigned since the values it > provides can only be positive. The function itself enforces this.Yup.> >> -void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn) >> +int p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn) >> { >> - struct vcpu *v = current; >> mem_event_request_t req; >> >> - /* Check that there''s space on the ring for this request */ >> - if ( mem_event_check_ring(d, &d->mem_event->paging) == 0) >> - { >> - /* Send release notification to pager */ >> - memset(&req, 0, sizeof(req)); >> - req.flags |= MEM_EVENT_FLAG_DROP_PAGE; >> - req.gfn = gfn; >> - req.vcpu_id = v->vcpu_id; >> + /* We allow no ring in this unique case, because it won''t affect >> + * correctness of the guest execution at this point. If this is >> the only >> + * page that happens to be paged-out, we''ll be okay.. but it''s >> likely the >> + * guest will crash shortly anyways. */ >> + int rc = mem_event_claim_slot(d, &d->mem_event->paging); >> + if ( rc < 0 ) >> + return rc; >> >> - mem_event_put_request(d, &d->mem_event->paging, &req); >> - } >> + /* Send release notification to pager */ >> + memset(&req, 0, sizeof(req)); >> + req.type = MEM_EVENT_TYPE_PAGING; >> + req.gfn = gfn; >> + req.flags = MEM_EVENT_FLAG_DROP_PAGE; >> + >> + mem_event_put_request(d, &d->mem_event->paging, &req); >> + return 0; >> } > > p2m_mem_paging_drop_page() should remain void because the caller has > already done its work, making it not restartable. Also it is only called > when a gfn is in paging state, which I''m sure can not happen without a > ring.Well, the rationale is that returning an error code can only help, should new error conditions arise. Keep in mind that the pager and the ring can disappear at any time, so ENOSYS can still happen.> > And quilt says: > Warning: trailing whitespace in lines 167,254 of > xen/arch/x86/mm/mem_event.c > Warning: trailing whitespace in line 168 of xen/common/memory.c > Warning: trailing whitespace in line 1127 of xen/arch/x86/mm/p2m.c >quilt ... the good times. I''ll refresh and add your signed-off-by to cover the portions of the work that originate from your end, is that ok? Thanks, Andres> > Olaf >
Olaf Hering
2012-Jan-13 19:57 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
On Fri, Jan 13, Andres Lagar-Cavilla wrote:> > p2m_mem_paging_drop_page() should remain void because the caller has > > already done its work, making it not restartable. Also it is only called > > when a gfn is in paging state, which I''m sure can not happen without a > > ring. > > Well, the rationale is that returning an error code can only help, should > new error conditions arise. Keep in mind that the pager and the ring can > disappear at any time, so ENOSYS can still happen.The ring should rather not disappear at all until ->paged_pages drops to zero. Unless the goal is a restartable pager, XEN_DOMCTL_MEM_EVENT_OP_PAGING_DISABLE should return -EBUSY when ->pages_pages is not zero. Then the checks in drop_page and populate can be relaxed.> I''ll refresh and add your signed-off-by to cover the portions of the work > that originate from your end, is that ok?I havent finish the review yet, have to check how it may work with wait queues in gfn_to_mfn*. Olaf
Andres Lagar-Cavilla
2012-Jan-13 20:05 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
> On Fri, Jan 13, Andres Lagar-Cavilla wrote: > >> > p2m_mem_paging_drop_page() should remain void because the caller has >> > already done its work, making it not restartable. Also it is only >> called >> > when a gfn is in paging state, which I''m sure can not happen without a >> > ring. >> >> Well, the rationale is that returning an error code can only help, >> should >> new error conditions arise. Keep in mind that the pager and the ring can >> disappear at any time, so ENOSYS can still happen. > > The ring should rather not disappear at all until ->paged_pages drops to > zero. Unless the goal is a restartable pager, > XEN_DOMCTL_MEM_EVENT_OP_PAGING_DISABLE should return -EBUSY when > ->pages_pages is not zero. Then the checks in drop_page and populate can > be relaxed.Well, there are two separate things here. Should drop return an error code? No harm in that. Then there is your point about the ring not going away if ->paged_pages is nonzero. Which I like, but is currently not implemented, afaict. Separate patch I guess.> >> I''ll refresh and add your signed-off-by to cover the portions of the >> work >> that originate from your end, is that ok? > > I havent finish the review yet, have to check how it may work with wait > queues in gfn_to_mfn*.Another case of "the two separate things" :) We definitely look forward to wait queue support for gfn_to_mfn*. But that''s a separate consumer of the wait queue feature. Are you worried that this patch might break your gfn_to_mfn* strategy? We did base it on your work. Thanks, Andres> > Olaf >
Olaf Hering
2012-Jan-16 15:04 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
On Wed, Jan 11, Andres Lagar-Cavilla wrote:> @@ -898,7 +902,7 @@ void p2m_mem_paging_drop_page(struct dom > * already sent to the pager. In this case the caller has to try again until the > * gfn is fully paged in again. > */ > -void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) > +int p2m_mem_paging_populate(struct domain *d, unsigned long gfn) > { > struct vcpu *v = current; > mem_event_request_t req;What is the reason to return an error here? None of the callers need to check it because they already know their condition (which is p2m_is_paging()). And your patch doesnt seem to update them. Foreigners try again, requests from the target domain do currently also try again until my change to use wait queues in gfn_to_mfn* is finished. And even then the call p2m_mem_paging_populate() will most likely remain a one-shot/must-succeed thing if the request is from the target domain. The only condition to care about in p2m_mem_paging_populate() is -ENOSYS from mem_event_claim_slot(), which your patch already handles by crashing the guest. Since the guest is in an kind-of undefined state anyway if the pager disappears while some gfns are still in paging state, this is one of the possible actions. Olaf
Olaf Hering
2012-Jan-16 15:08 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
On Fri, Jan 13, Andres Lagar-Cavilla wrote:> > On Fri, Jan 13, Andres Lagar-Cavilla wrote: > > > >> > p2m_mem_paging_drop_page() should remain void because the caller has > >> > already done its work, making it not restartable. Also it is only > >> called > >> > when a gfn is in paging state, which I''m sure can not happen without a > >> > ring. > >> > >> Well, the rationale is that returning an error code can only help, > >> should > >> new error conditions arise. Keep in mind that the pager and the ring can > >> disappear at any time, so ENOSYS can still happen. > > > > The ring should rather not disappear at all until ->paged_pages drops to > > zero. Unless the goal is a restartable pager, > > XEN_DOMCTL_MEM_EVENT_OP_PAGING_DISABLE should return -EBUSY when > > ->pages_pages is not zero. Then the checks in drop_page and populate can > > be relaxed. > > Well, there are two separate things here. Should drop return an error > code? No harm in that. Then there is your point about the ring not going > away if ->paged_pages is nonzero. Which I like, but is currently not > implemented, afaict. Separate patch I guess.Since there is no consumer of the added return codes from p2m_mem_paging_drop_page() and p2m_mem_paging_populate() I dont see the point to change the function prototypes. I will prepare a patch to improve the check wether the ring is available. As you said, it can disappear any time. Olaf
Andres Lagar-Cavilla
2012-Jan-16 15:08 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
> On Wed, Jan 11, Andres Lagar-Cavilla wrote: > >> @@ -898,7 +902,7 @@ void p2m_mem_paging_drop_page(struct dom >> * already sent to the pager. In this case the caller has to try again >> until the >> * gfn is fully paged in again. >> */ >> -void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) >> +int p2m_mem_paging_populate(struct domain *d, unsigned long gfn) >> { >> struct vcpu *v = current; >> mem_event_request_t req; > > What is the reason to return an error here? None of the callers need to > check it because they already know their condition (which is > p2m_is_paging()). And your patch doesnt seem to update them. Foreigners > try again, requests from the target domain do currently also try again > until my change to use wait queues in gfn_to_mfn* is finished. And even > then the call p2m_mem_paging_populate() will most likely remain a > one-shot/must-succeed thing if the request is from the target domain. > > The only condition to care about in p2m_mem_paging_populate() is -ENOSYS > from mem_event_claim_slot(), which your patch already handles by > crashing the guest. Since the guest is in an kind-of undefined state > anyway if the pager disappears while some gfns are still in paging state, > this is one of the possible actions.Ok, we''ll make it void. Anything else pending? Thanks Andres> Olaf >
Olaf Hering
2012-Jan-16 15:10 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
On Wed, Jan 11, Andres Lagar-Cavilla wrote:> xen/arch/x86/hvm/hvm.c | 18 +- > xen/arch/x86/mm/mem_event.c | 298 +++++++++++++++++++++++++++++++++------ > xen/arch/x86/mm/mem_sharing.c | 30 +-- > xen/arch/x86/mm/p2m.c | 81 +++++----- > xen/common/memory.c | 7 +- > xen/include/asm-x86/mem_event.h | 22 +- > xen/include/asm-x86/p2m.h | 12 +- > xen/include/xen/mm.h | 2 + > xen/include/xen/sched.h | 22 ++- > 9 files changed, 359 insertions(+), 133 deletions(-) > > > This patch is an amalgamation of the work done by Olaf Hering <olaf@aepfle.de> > and our work. > > It combines logic changes to simplify the memory event API, as well as > leveraging wait queues to deal with extreme conditions in which too many events are > generated by a guest vcpu.I''m ok with the approach, and it looks like the approach does not conflict with my attempt to use waitqueues in get_gfn_type_access(). If the ring is full, the vcpu is put in a wait queue. Olaf
Olaf Hering
2012-Jan-16 15:22 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
On Wed, Jan 11, Andres Lagar-Cavilla wrote:> xen/arch/x86/hvm/hvm.c | 18 +- > xen/arch/x86/mm/mem_event.c | 298 +++++++++++++++++++++++++++++++++------ > xen/arch/x86/mm/mem_sharing.c | 30 +-- > xen/arch/x86/mm/p2m.c | 81 +++++----- > xen/common/memory.c | 7 +- > xen/include/asm-x86/mem_event.h | 22 +- > xen/include/asm-x86/p2m.h | 12 +- > xen/include/xen/mm.h | 2 + > xen/include/xen/sched.h | 22 ++- > 9 files changed, 359 insertions(+), 133 deletions(-) > > > This patch is an amalgamation of the work done by Olaf Hering <olaf@aepfle.de> > and our work. > > It combines logic changes to simplify the memory event API, as well as > leveraging wait queues to deal with extreme conditions in which too many events are > generated by a guest vcpu.Signed-off-by: Olaf Hering <olaf@aepfle.de> Olaf
Andres Lagar-Cavilla
2012-Jan-16 15:39 UTC
[PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
xen/arch/x86/hvm/hvm.c | 16 +- xen/arch/x86/mm/mem_event.c | 296 +++++++++++++++++++++++++++++++++------ xen/arch/x86/mm/mem_sharing.c | 30 +-- xen/arch/x86/mm/p2m.c | 78 +++++----- xen/include/asm-x86/mem_event.h | 22 +- xen/include/asm-x86/p2m.h | 9 +- xen/include/xen/mm.h | 2 + xen/include/xen/sched.h | 22 ++- 8 files changed, 347 insertions(+), 128 deletions(-) This patch is an amalgamation of the work done by Olaf Hering <olaf@aepfle.de> and our work. It combines logic changes to simplify the memory event API, as well as leveraging wait queues to deal with extreme conditions in which too many events are generated by a guest vcpu. In order to generate a new event, a slot in the ring is claimed. If a guest vcpu is generating the event and there is no space, it is put on a wait queue. If a foreign vcpu is generating the event and there is no space, the vcpu is expected to retry its operation. If an error happens later, the function returns the claimed slot via a cancel operation. Thus, the API has only four calls: claim slot, cancel claimed slot, put request in the ring, consume the response. With all these mechanisms, no guest events are lost. Our testing includes 1. ballooning down 512 MiBs; 2. using mem access n2rwx, in which every page access in a four-vCPU guest results in an event, with no vCPU pausing, and the four vCPUs touching all RAM. No guest events were lost in either case, and qemu-dm had no mapping problems. For full disclosure, here is Olaf''s log of changes: v7: - add ring accounting so that each vcpu can place at least one request v6: - take foreign requests into account before calling wake_up_nr() - call wake_up_nr() outside of ring lock - rename ->bit to ->pause_flag - update comment in mem_event_enable() v5: - rename mem_event_check_ring() to mem_event_claim_slot() - rename mem_event_put_req_producers() to mem_event_release_slot() - add local/foreign request accounting - keep room for at least one guest request v4: - fix off-by-one bug in _mem_event_put_request - add mem_event_wake_requesters() and use wake_up_nr() - rename mem_event_mark_and_pause() and mem_event_mark_and_pause() functions - req_producers counts foreign request producers, rename member v3: - rename ->mem_event_bit to ->bit - remove me_ from new VPF_ defines v2: - p2m_mem_paging_populate: move vcpu_pause after put_request, otherwise the vcpu will not wake_up after a wait_event because the pause_count was increased twice. Fixes guest hangs. - update free space check in _mem_event_put_request() - simplify mem_event_put_request() This version also includes Olaf''s latest (Jan 16th 2012) comments. Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Olaf hering <olaf@aepfle.de> diff -r c0bb1c65797b -r 28c68de1ea25 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4198,15 +4198,21 @@ static int hvm_memory_event_traps(long p if ( (p & HVMPME_onchangeonly) && (value == old) ) return 1; - - rc = mem_event_check_ring(d, &d->mem_event->access); - if ( rc ) + + rc = mem_event_claim_slot(d, &d->mem_event->access); + if ( rc == -ENOSYS ) + { + /* If there was no ring to handle the event, then + * simple continue executing normally. */ + return 1; + } + else if ( rc < 0 ) return rc; - + memset(&req, 0, sizeof(req)); req.type = MEM_EVENT_TYPE_ACCESS; req.reason = reason; - + if ( (p & HVMPME_MODE_MASK) == HVMPME_mode_sync ) { req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; diff -r c0bb1c65797b -r 28c68de1ea25 xen/arch/x86/mm/mem_event.c --- a/xen/arch/x86/mm/mem_event.c +++ b/xen/arch/x86/mm/mem_event.c @@ -23,6 +23,7 @@ #include <asm/domain.h> #include <xen/event.h> +#include <xen/wait.h> #include <asm/p2m.h> #include <asm/mem_event.h> #include <asm/mem_paging.h> @@ -41,6 +42,7 @@ static int mem_event_enable( struct domain *d, xen_domctl_mem_event_op_t *mec, struct mem_event_domain *med, + int pause_flag, xen_event_channel_notification_t notification_fn) { int rc; @@ -93,6 +95,9 @@ static int mem_event_enable( put_gfn(dom_mem_event, ring_gfn); put_gfn(dom_mem_event, shared_gfn); + /* Set the number of currently blocked vCPUs to 0. */ + med->blocked = 0; + /* Allocate event channel */ rc = alloc_unbound_xen_event_channel(d->vcpu[0], current->domain->domain_id, @@ -110,8 +115,11 @@ static int mem_event_enable( mem_event_ring_lock_init(med); - /* Wake any VCPUs paused for memory events */ - mem_event_unpause_vcpus(d); + /* Save the pause flag for this particular ring. */ + med->pause_flag = pause_flag; + + /* Initialize the last-chance wait queue. */ + init_waitqueue_head(&med->wq); return 0; @@ -125,48 +133,216 @@ static int mem_event_enable( return rc; } -static int mem_event_disable(struct mem_event_domain *med) +static unsigned int mem_event_ring_available(struct mem_event_domain *med) { - unmap_domain_page(med->ring_page); - med->ring_page = NULL; + int avail_req = RING_FREE_REQUESTS(&med->front_ring); + avail_req -= med->target_producers; + avail_req -= med->foreign_producers; - unmap_domain_page(med->shared_page); - med->shared_page = NULL; + BUG_ON(avail_req < 0); + + return avail_req; +} + +/* + * mem_event_wake_blocked() will wakeup vcpus waiting for room in the + * ring. These vCPUs were paused on their way out after placing an event, + * but need to be resumed where the ring is capable of processing at least + * one event from them. + */ +static void mem_event_wake_blocked(struct domain *d, struct mem_event_domain *med) +{ + struct vcpu *v; + int online = d->max_vcpus; + unsigned int avail_req = mem_event_ring_available(med); + + if ( avail_req == 0 || med->blocked == 0 ) + return; + + /* + * We ensure that we only have vCPUs online if there are enough free slots + * for their memory events to be processed. This will ensure that no + * memory events are lost (due to the fact that certain types of events + * cannot be replayed, we need to ensure that there is space in the ring + * for when they are hit). + * See comment below in mem_event_put_request(). + */ + for_each_vcpu ( d, v ) + if ( test_bit(med->pause_flag, &v->pause_flags) ) + online--; + + ASSERT(online == (d->max_vcpus - med->blocked)); + + /* We remember which vcpu last woke up to avoid scanning always linearly + * from zero and starving higher-numbered vcpus under high load */ + if ( d->vcpu ) + { + int i, j, k; + + for (i = med->last_vcpu_wake_up + 1, j = 0; j < d->max_vcpus; i++, j++) + { + k = i % d->max_vcpus; + v = d->vcpu[k]; + if ( !v ) + continue; + + if ( !(med->blocked) || online >= avail_req ) + break; + + if ( test_and_clear_bit(med->pause_flag, &v->pause_flags) ) + { + vcpu_unpause(v); + online++; + med->blocked--; + med->last_vcpu_wake_up = k; + } + } + } +} + +/* + * In the event that a vCPU attempted to place an event in the ring and + * was unable to do so, it is queued on a wait queue. These are woken as + * needed, and take precedence over the blocked vCPUs. + */ +static void mem_event_wake_queued(struct domain *d, struct mem_event_domain *med) +{ + unsigned int avail_req = mem_event_ring_available(med); + + if ( avail_req > 0 ) + wake_up_nr(&med->wq, avail_req); +} + +/* + * mem_event_wake() will wakeup all vcpus waiting for the ring to + * become available. If we have queued vCPUs, they get top priority. We + * are guaranteed that they will go through code paths that will eventually + * call mem_event_wake() again, ensuring that any blocked vCPUs will get + * unpaused once all the queued vCPUs have made it through. + */ +void mem_event_wake(struct domain *d, struct mem_event_domain *med) +{ + if (!list_empty(&med->wq.list)) + mem_event_wake_queued(d, med); + else + mem_event_wake_blocked(d, med); +} + +static int mem_event_disable(struct domain *d, struct mem_event_domain *med) +{ + if ( med->ring_page ) + { + struct vcpu *v; + + mem_event_ring_lock(med); + + if ( !list_empty(&med->wq.list) ) + { + mem_event_ring_unlock(med); + return -EBUSY; + } + + unmap_domain_page(med->ring_page); + med->ring_page = NULL; + + unmap_domain_page(med->shared_page); + med->shared_page = NULL; + + /* Unblock all vCPUs */ + for_each_vcpu ( d, v ) + { + if ( test_and_clear_bit(med->pause_flag, &v->pause_flags) ) + { + vcpu_unpause(v); + med->blocked--; + } + } + + mem_event_ring_unlock(med); + } return 0; } -void mem_event_put_request(struct domain *d, struct mem_event_domain *med, mem_event_request_t *req) +static inline void mem_event_release_slot(struct domain *d, + struct mem_event_domain *med) +{ + /* Update the accounting */ + if ( current->domain == d ) + med->target_producers--; + else + med->foreign_producers--; + + /* Kick any waiters */ + mem_event_wake(d, med); +} + +/* + * mem_event_mark_and_pause() tags vcpu and put it to sleep. + * The vcpu will resume execution in mem_event_wake_waiters(). + */ +void mem_event_mark_and_pause(struct vcpu *v, struct mem_event_domain *med) +{ + if ( !test_and_set_bit(med->pause_flag, &v->pause_flags) ) + { + vcpu_pause_nosync(v); + med->blocked++; + } +} + +/* + * This must be preceded by a call to claim_slot(), and is guaranteed to + * succeed. As a side-effect however, the vCPU may be paused if the ring is + * overly full and its continued execution would cause stalling and excessive + * waiting. The vCPU will be automatically unpaused when the ring clears. + */ +void mem_event_put_request(struct domain *d, + struct mem_event_domain *med, + mem_event_request_t *req) { mem_event_front_ring_t *front_ring; + int free_req; + unsigned int avail_req; RING_IDX req_prod; - mem_event_ring_lock(med); - - front_ring = &med->front_ring; - req_prod = front_ring->req_prod_pvt; - if ( current->domain != d ) { req->flags |= MEM_EVENT_FLAG_FOREIGN; ASSERT( !(req->flags & MEM_EVENT_FLAG_VCPU_PAUSED) ); } + mem_event_ring_lock(med); + + /* Due to the reservations, this step must succeed. */ + front_ring = &med->front_ring; + free_req = RING_FREE_REQUESTS(front_ring); + ASSERT(free_req > 0); + /* Copy request */ + req_prod = front_ring->req_prod_pvt; memcpy(RING_GET_REQUEST(front_ring, req_prod), req, sizeof(*req)); req_prod++; /* Update ring */ - med->req_producers--; front_ring->req_prod_pvt = req_prod; RING_PUSH_REQUESTS(front_ring); + /* We''ve actually *used* our reservation, so release the slot. */ + mem_event_release_slot(d, med); + + /* Give this vCPU a black eye if necessary, on the way out. + * See the comments above wake_blocked() for more information + * on how this mechanism works to avoid waiting. */ + avail_req = mem_event_ring_available(med); + if( current->domain == d && avail_req < d->max_vcpus ) + mem_event_mark_and_pause(current, med); + mem_event_ring_unlock(med); notify_via_xen_event_channel(d, med->xen_port); } -int mem_event_get_response(struct mem_event_domain *med, mem_event_response_t *rsp) +int mem_event_get_response(struct domain *d, struct mem_event_domain *med, mem_event_response_t *rsp) { mem_event_front_ring_t *front_ring; RING_IDX rsp_cons; @@ -190,57 +366,81 @@ int mem_event_get_response(struct mem_ev front_ring->rsp_cons = rsp_cons; front_ring->sring->rsp_event = rsp_cons + 1; + /* Kick any waiters -- since we''ve just consumed an event, + * there may be additional space available in the ring. */ + mem_event_wake(d, med); + mem_event_ring_unlock(med); return 1; } -void mem_event_unpause_vcpus(struct domain *d) -{ - struct vcpu *v; - - for_each_vcpu ( d, v ) - if ( test_and_clear_bit(_VPF_mem_event, &v->pause_flags) ) - vcpu_wake(v); -} - -void mem_event_mark_and_pause(struct vcpu *v) -{ - set_bit(_VPF_mem_event, &v->pause_flags); - vcpu_sleep_nosync(v); -} - -void mem_event_put_req_producers(struct mem_event_domain *med) +void mem_event_cancel_slot(struct domain *d, struct mem_event_domain *med) { mem_event_ring_lock(med); - med->req_producers--; + mem_event_release_slot(d, med); mem_event_ring_unlock(med); } -int mem_event_check_ring(struct domain *d, struct mem_event_domain *med) +static int mem_event_grab_slot(struct mem_event_domain *med, int foreign) { - struct vcpu *curr = current; - int free_requests; - int ring_full = 1; + unsigned int avail_req; if ( !med->ring_page ) - return -1; + return -ENOSYS; mem_event_ring_lock(med); - free_requests = RING_FREE_REQUESTS(&med->front_ring); - if ( med->req_producers < free_requests ) + avail_req = mem_event_ring_available(med); + if ( avail_req == 0 ) { - med->req_producers++; - ring_full = 0; + mem_event_ring_unlock(med); + return -EBUSY; } - if ( ring_full && (curr->domain == d) ) - mem_event_mark_and_pause(curr); + if ( !foreign ) + med->target_producers++; + else + med->foreign_producers++; mem_event_ring_unlock(med); - return ring_full; + return 0; +} + +/* Simple try_grab wrapper for use in the wait_event() macro. */ +static int mem_event_wait_try_grab(struct mem_event_domain *med, int *rc) +{ + *rc = mem_event_grab_slot(med, 0); + return *rc; +} + +/* Call mem_event_grab_slot() until the ring doesn''t exist, or is available. */ +static int mem_event_wait_slot(struct mem_event_domain *med) +{ + int rc = -EBUSY; + wait_event(med->wq, mem_event_wait_try_grab(med, &rc) != -EBUSY); + return rc; +} + +/* + * Determines whether or not the current vCPU belongs to the target domain, + * and calls the appropriate wait function. If it is a guest vCPU, then we + * use mem_event_wait_slot() to reserve a slot. As long as there is a ring, + * this function will always return 0 for a guest. For a non-guest, we check + * for space and return -EBUSY if the ring is not available. + * + * Return codes: -ENOSYS: the ring is not yet configured + * -EBUSY: the ring is busy + * 0: a spot has been reserved + * + */ +int mem_event_claim_slot(struct domain *d, struct mem_event_domain *med) +{ + if ( current->domain == d ) + return mem_event_wait_slot(med); + else + return mem_event_grab_slot(med, 1); } /* Registered with Xen-bound event channel for incoming notifications. */ @@ -316,14 +516,14 @@ int mem_event_domctl(struct domain *d, x if ( p2m->pod.entry_count ) break; - rc = mem_event_enable(d, mec, med, mem_paging_notification); + rc = mem_event_enable(d, mec, med, _VPF_mem_paging, mem_paging_notification); } break; case XEN_DOMCTL_MEM_EVENT_OP_PAGING_DISABLE: { if ( med->ring_page ) - rc = mem_event_disable(med); + rc = mem_event_disable(d, med); } break; @@ -355,14 +555,14 @@ int mem_event_domctl(struct domain *d, x if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) break; - rc = mem_event_enable(d, mec, med, mem_access_notification); + rc = mem_event_enable(d, mec, med, _VPF_mem_access, mem_access_notification); } break; case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_DISABLE: { if ( med->ring_page ) - rc = mem_event_disable(med); + rc = mem_event_disable(d, med); } break; diff -r c0bb1c65797b -r 28c68de1ea25 xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -253,18 +253,10 @@ static void mem_sharing_audit(void) #endif -static struct page_info* mem_sharing_alloc_page(struct domain *d, - unsigned long gfn) +static void mem_sharing_notify_helper(struct domain *d, unsigned long gfn) { - struct page_info* page; struct vcpu *v = current; - mem_event_request_t req; - - page = alloc_domheap_page(d, 0); - if(page != NULL) return page; - - memset(&req, 0, sizeof(req)); - req.type = MEM_EVENT_TYPE_SHARED; + mem_event_request_t req = { .type = MEM_EVENT_TYPE_SHARED }; if ( v->domain != d ) { @@ -275,20 +267,21 @@ static struct page_info* mem_sharing_all gdprintk(XENLOG_ERR, "Failed alloc on unshare path for foreign (%d) lookup\n", d->domain_id); - return page; + return; } + if (mem_event_claim_slot(d, &d->mem_event->share) < 0) + { + return; + } + + req.flags = MEM_EVENT_FLAG_VCPU_PAUSED; vcpu_pause_nosync(v); - req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; - - if(mem_event_check_ring(d, &d->mem_event->share)) return page; req.gfn = gfn; req.p2mt = p2m_ram_shared; req.vcpu_id = v->vcpu_id; mem_event_put_request(d, &d->mem_event->share, &req); - - return page; } unsigned int mem_sharing_get_nr_saved_mfns(void) @@ -301,7 +294,7 @@ int mem_sharing_sharing_resume(struct do mem_event_response_t rsp; /* Get all requests off the ring */ - while ( mem_event_get_response(&d->mem_event->share, &rsp) ) + while ( mem_event_get_response(d, &d->mem_event->share, &rsp) ) { if ( rsp.flags & MEM_EVENT_FLAG_DUMMY ) continue; @@ -658,13 +651,14 @@ gfn_found: if(ret == 0) goto private_page_found; old_page = page; - page = mem_sharing_alloc_page(d, gfn); + page = alloc_domheap_page(d, 0); if(!page) { /* We''ve failed to obtain memory for private page. Need to re-add the * gfn_info to relevant list */ list_add(&gfn_info->list, &hash_entry->gfns); put_gfn(d, gfn); + mem_sharing_notify_helper(d, gfn); shr_unlock(); return -ENOMEM; } diff -r c0bb1c65797b -r 28c68de1ea25 xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -861,20 +861,23 @@ int p2m_mem_paging_evict(struct domain * */ void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn) { - struct vcpu *v = current; mem_event_request_t req; - /* Check that there''s space on the ring for this request */ - if ( mem_event_check_ring(d, &d->mem_event->paging) == 0) - { - /* Send release notification to pager */ - memset(&req, 0, sizeof(req)); - req.flags |= MEM_EVENT_FLAG_DROP_PAGE; - req.gfn = gfn; - req.vcpu_id = v->vcpu_id; + /* We allow no ring in this unique case, because it won''t affect + * correctness of the guest execution at this point. If this is the only + * page that happens to be paged-out, we''ll be okay.. but it''s likely the + * guest will crash shortly anyways. */ + int rc = mem_event_claim_slot(d, &d->mem_event->paging); + if ( rc < 0 ) + return; - mem_event_put_request(d, &d->mem_event->paging, &req); - } + /* Send release notification to pager */ + memset(&req, 0, sizeof(req)); + req.type = MEM_EVENT_TYPE_PAGING; + req.gfn = gfn; + req.flags = MEM_EVENT_FLAG_DROP_PAGE; + + mem_event_put_request(d, &d->mem_event->paging, &req); } /** @@ -898,7 +901,7 @@ void p2m_mem_paging_drop_page(struct dom * already sent to the pager. In this case the caller has to try again until the * gfn is fully paged in again. */ -void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) +int p2m_mem_paging_populate(struct domain *d, unsigned long gfn) { struct vcpu *v = current; mem_event_request_t req; @@ -907,9 +910,17 @@ void p2m_mem_paging_populate(struct doma mfn_t mfn; struct p2m_domain *p2m = p2m_get_hostp2m(d); - /* Check that there''s space on the ring for this request */ - if ( mem_event_check_ring(d, &d->mem_event->paging) ) - return; + /* We''re paging. There should be a ring */ + int rc = mem_event_claim_slot(d, &d->mem_event->paging); + if ( rc == -ENOSYS ) + { + gdprintk(XENLOG_ERR, "Domain %hu paging gfn %lx yet no ring " + "in place\n", d->domain_id, gfn); + domain_crash(d); + return 0; + } + else if ( rc < 0 ) + return rc; memset(&req, 0, sizeof(req)); req.type = MEM_EVENT_TYPE_PAGING; @@ -929,7 +940,7 @@ void p2m_mem_paging_populate(struct doma p2m_unlock(p2m); /* Pause domain if request came from guest and gfn has paging type */ - if ( p2m_is_paging(p2mt) && v->domain == d ) + if ( p2m_is_paging(p2mt) && v->domain == d ) { vcpu_pause_nosync(v); req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; @@ -938,8 +949,8 @@ void p2m_mem_paging_populate(struct doma else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged ) { /* gfn is already on its way back and vcpu is not paused */ - mem_event_put_req_producers(&d->mem_event->paging); - return; + mem_event_cancel_slot(d, &d->mem_event->paging); + return 0; } /* Send request to pager */ @@ -948,6 +959,7 @@ void p2m_mem_paging_populate(struct doma req.vcpu_id = v->vcpu_id; mem_event_put_request(d, &d->mem_event->paging, &req); + return 0; } /** @@ -1065,7 +1077,7 @@ void p2m_mem_paging_resume(struct domain mfn_t mfn; /* Pull all responses off the ring */ - while( mem_event_get_response(&d->mem_event->paging, &rsp) ) + while( mem_event_get_response(d, &d->mem_event->paging, &rsp) ) { if ( rsp.flags & MEM_EVENT_FLAG_DUMMY ) continue; @@ -1090,9 +1102,6 @@ void p2m_mem_paging_resume(struct domain if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) vcpu_unpause(d->vcpu[rsp.vcpu_id]); } - - /* Unpause any domains that were paused because the ring was full */ - mem_event_unpause_vcpus(d); } bool_t p2m_mem_access_check(unsigned long gpa, bool_t gla_valid, unsigned long gla, @@ -1103,7 +1112,6 @@ bool_t p2m_mem_access_check(unsigned lon unsigned long gfn = gpa >> PAGE_SHIFT; struct domain *d = v->domain; struct p2m_domain* p2m = p2m_get_hostp2m(d); - int res; mfn_t mfn; p2m_type_t p2mt; p2m_access_t p2ma; @@ -1126,17 +1134,16 @@ bool_t p2m_mem_access_check(unsigned lon p2m_unlock(p2m); /* Otherwise, check if there is a memory event listener, and send the message along */ - res = mem_event_check_ring(d, &d->mem_event->access); - if ( res < 0 ) + if ( mem_event_claim_slot(d, &d->mem_event->access) == -ENOSYS ) { /* No listener */ if ( p2m->access_required ) { - printk(XENLOG_INFO - "Memory access permissions failure, no mem_event listener: pausing VCPU %d, dom %d\n", - v->vcpu_id, d->domain_id); - - mem_event_mark_and_pause(v); + gdprintk(XENLOG_INFO, "Memory access permissions failure, " + "no mem_event listener VCPU %d, dom %d\n", + v->vcpu_id, d->domain_id); + domain_crash(v->domain); + return 0; } else { @@ -1149,11 +1156,7 @@ bool_t p2m_mem_access_check(unsigned lon } return 1; } - - return 0; } - else if ( res > 0 ) - return 0; /* No space in buffer; VCPU paused */ memset(&req, 0, sizeof(req)); req.type = MEM_EVENT_TYPE_ACCESS; @@ -1188,7 +1191,7 @@ void p2m_mem_access_resume(struct domain mem_event_response_t rsp; /* Pull all responses off the ring */ - while( mem_event_get_response(&d->mem_event->access, &rsp) ) + while( mem_event_get_response(d, &d->mem_event->access, &rsp) ) { if ( rsp.flags & MEM_EVENT_FLAG_DUMMY ) continue; @@ -1196,13 +1199,8 @@ void p2m_mem_access_resume(struct domain if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) vcpu_unpause(d->vcpu[rsp.vcpu_id]); } - - /* Unpause any domains that were paused because the ring was full or no listener - * was available */ - mem_event_unpause_vcpus(d); } - /* Set access type for a region of pfns. * If start_pfn == -1ul, sets the default access type */ int p2m_set_mem_access(struct domain *d, unsigned long start_pfn, diff -r c0bb1c65797b -r 28c68de1ea25 xen/include/asm-x86/mem_event.h --- a/xen/include/asm-x86/mem_event.h +++ b/xen/include/asm-x86/mem_event.h @@ -24,18 +24,24 @@ #ifndef __MEM_EVENT_H__ #define __MEM_EVENT_H__ -/* Pauses VCPU while marking pause flag for mem event */ -void mem_event_mark_and_pause(struct vcpu *v); -int mem_event_check_ring(struct domain *d, struct mem_event_domain *med); -void mem_event_put_req_producers(struct mem_event_domain *med); -void mem_event_put_request(struct domain *d, struct mem_event_domain *med, mem_event_request_t *req); -int mem_event_get_response(struct mem_event_domain *med, mem_event_response_t *rsp); -void mem_event_unpause_vcpus(struct domain *d); +/* Returns 0 on success, -ENOSYS if there is no ring, -EBUSY if there is no + * available space. For success or -EBUSY, the vCPU may be left blocked + * temporarily to ensure that the ring does not lose future events. In + * general, you must follow a claim_slot() call with either put_request() or + * cancel_slot(), both of which are guaranteed to succeed. */ +int mem_event_claim_slot(struct domain *d, struct mem_event_domain *med); + +void mem_event_cancel_slot(struct domain *d, struct mem_event_domain *med); + +void mem_event_put_request(struct domain *d, struct mem_event_domain *med, + mem_event_request_t *req); + +int mem_event_get_response(struct domain *d, struct mem_event_domain *med, + mem_event_response_t *rsp); int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec, XEN_GUEST_HANDLE(void) u_domctl); - #endif /* __MEM_EVENT_H__ */ diff -r c0bb1c65797b -r 28c68de1ea25 xen/include/asm-x86/p2m.h --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -480,16 +480,15 @@ int p2m_mem_paging_evict(struct domain * /* Tell xenpaging to drop a paged out frame */ void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn); /* Start populating a paged out frame */ -void p2m_mem_paging_populate(struct domain *d, unsigned long gfn); +int p2m_mem_paging_populate(struct domain *d, unsigned long gfn); /* Prepare the p2m for paging a frame in */ int p2m_mem_paging_prep(struct domain *d, unsigned long gfn, uint64_t buffer); /* Resume normal operation (in case a domain was paused) */ void p2m_mem_paging_resume(struct domain *d); #else -static inline void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn) -{ } -static inline void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) -{ } +#define p2m_mem_paging_drop_page(d, g) ((void)0) +static inline int p2m_mem_paging_populate(struct domain *d, unsigned long gfn) +{ return 0; } #endif #ifdef __x86_64__ diff -r c0bb1c65797b -r 28c68de1ea25 xen/include/xen/mm.h --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -318,6 +318,8 @@ page_list_splice(struct page_list_head * void scrub_one_page(struct page_info *); +/* Returns 1 on success, 0 on error, negative if the ring + * for event propagation is full in the presence of paging */ int guest_remove_page(struct domain *d, unsigned long gmfn); #define RAM_TYPE_CONVENTIONAL 0x00000001 diff -r c0bb1c65797b -r 28c68de1ea25 xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -14,6 +14,7 @@ #include <xen/nodemask.h> #include <xen/radix-tree.h> #include <xen/multicall.h> +#include <xen/wait.h> #include <public/xen.h> #include <public/domctl.h> #include <public/sysctl.h> @@ -183,7 +184,9 @@ struct mem_event_domain { /* ring lock */ spinlock_t ring_lock; - unsigned int req_producers; + /* The ring has 64 entries */ + unsigned char foreign_producers; + unsigned char target_producers; /* shared page */ mem_event_shared_page_t *shared_page; /* shared ring page */ @@ -192,6 +195,14 @@ struct mem_event_domain mem_event_front_ring_t front_ring; /* event channel port (vcpu0 only) */ int xen_port; + /* mem_event bit for vcpu->pause_flags */ + int pause_flag; + /* list of vcpus waiting for room in the ring */ + struct waitqueue_head wq; + /* the number of vCPUs blocked */ + unsigned int blocked; + /* The last vcpu woken up */ + unsigned int last_vcpu_wake_up; }; struct mem_event_per_domain @@ -615,9 +626,12 @@ static inline struct domain *next_domain /* VCPU affinity has changed: migrating to a new CPU. */ #define _VPF_migrating 3 #define VPF_migrating (1UL<<_VPF_migrating) - /* VCPU is blocked on memory-event ring. */ -#define _VPF_mem_event 4 -#define VPF_mem_event (1UL<<_VPF_mem_event) + /* VCPU is blocked due to missing mem_paging ring. */ +#define _VPF_mem_paging 4 +#define VPF_mem_paging (1UL<<_VPF_mem_paging) + /* VCPU is blocked due to missing mem_access ring. */ +#define _VPF_mem_access 5 +#define VPF_mem_access (1UL<<_VPF_mem_access) static inline int vcpu_runnable(struct vcpu *v) {
Olaf Hering
2012-Jan-16 15:41 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
On Mon, Jan 16, Andres Lagar-Cavilla wrote:> --- a/xen/include/asm-x86/p2m.h > +++ b/xen/include/asm-x86/p2m.h > @@ -480,16 +480,15 @@ int p2m_mem_paging_evict(struct domain * > /* Tell xenpaging to drop a paged out frame */ > void p2m_mem_paging_drop_page(struct domain *d, unsigned long gfn); > /* Start populating a paged out frame */ > -void p2m_mem_paging_populate(struct domain *d, unsigned long gfn); > +int p2m_mem_paging_populate(struct domain *d, unsigned long gfn);As I said, this should remain void because the callers cant possibly care about errors. Olaf
Olaf Hering
2012-Jan-16 16:22 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
On Mon, Jan 16, Olaf Hering wrote:> I will prepare a patch to improve the check wether the ring is > available. As you said, it can disappear any time.Your changes to mem_event_disable() already contain the changes I would have made as well, making changes only with the ring lock held. Minor detail: In mem_event_enable(), the call to mem_event_ring_lock_init() should be moved up before the pointers are assigned to close the tiny window between lock init and assignment. Olaf
Tim Deegan
2012-Jan-19 10:39 UTC
Re: [PATCH] x86/mm: Improve ring management for memory events. Do not lose guest events
At 10:39 -0500 on 16 Jan (1326710344), Andres Lagar-Cavilla wrote:> This patch is an amalgamation of the work done by Olaf Hering <olaf@aepfle.de> > and our work. > > It combines logic changes to simplify the memory event API, as well as > leveraging wait queues to deal with extreme conditions in which too many events are > generated by a guest vcpu. > > In order to generate a new event, a slot in the ring is claimed. If a guest vcpu > is generating the event and there is no space, it is put on a wait queue. If a > foreign vcpu is generating the event and there is no space, the vcpu is expected > to retry its operation. If an error happens later, the function returns the > claimed slot via a cancel operation. > > Thus, the API has only four calls: claim slot, cancel claimed slot, put request > in the ring, consume the response. > > With all these mechanisms, no guest events are lost. > Our testing includes 1. ballooning down 512 MiBs; 2. using mem access n2rwx, in > which every page access in a four-vCPU guest results in an event, with no vCPU > pausing, and the four vCPUs touching all RAM. No guest events were lost in > either case, and qemu-dm had no mapping problems.Applied, thanks. I made two changes, both suggested by Olaf: - moved the lock init up in mem_event_enable(); and - reverted p2m_mem_paging_populate() to return void, as none of the callers had been changed to care about its new return value. In general, the callers of p2m_mem_paging_populate() are still a bit of a mess; that should all be happening behind the p2m interface. But that''s for another time... Cheers, Tim.