Andres Lagar-Cavilla
2011-Nov-14 21:58 UTC
[Xen-devel] [PATCH 0 of 4] Mem event handling improvements
Several improvements to the mem event interface are included in this series. Most notably: - The ability to trigger resume actions in the hypervisor via an event channel kick, as opposed to a domctl. Less locking and more batching. - Improvements to the management of the mem event ring. Improvements on return codes, vcpu pause semantics, and handling of corner cases. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Keir Fraser <keir@xen.org> xen/arch/x86/hvm/hvm.c | 21 +++- xen/arch/x86/mm/mem_event.c | 173 +++++++++++++++++++++++++++++---------- xen/arch/x86/mm/mem_sharing.c | 27 ++++- xen/arch/x86/mm/p2m.c | 100 ++++++++++++---------- xen/include/asm-x86/mem_event.h | 16 ++- xen/include/xen/sched.h | 5 +- xen/arch/ia64/vmx/vmx_init.c | 2 +- xen/arch/x86/hvm/hvm.c | 5 +- xen/arch/x86/mm/mem_event.c | 3 +- xen/common/event_channel.c | 75 +++++++++++++--- xen/include/xen/event.h | 5 +- xen/include/xen/sched.h | 2 +- xen/arch/x86/mm/mem_access.c | 3 +- xen/arch/x86/mm/p2m.c | 3 +- xen/include/asm-x86/p2m.h | 2 +- xen/arch/x86/mm/mem_event.c | 26 ++++- 16 files changed, 324 insertions(+), 144 deletions(-) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-14 21:58 UTC
[Xen-devel] [PATCH 1 of 4] Improve ring management for memory events
xen/arch/x86/hvm/hvm.c | 21 +++- xen/arch/x86/mm/mem_event.c | 173 +++++++++++++++++++++++++++++---------- xen/arch/x86/mm/mem_sharing.c | 27 ++++- xen/arch/x86/mm/p2m.c | 100 ++++++++++++---------- xen/include/asm-x86/mem_event.h | 16 ++- xen/include/xen/sched.h | 5 +- 6 files changed, 232 insertions(+), 110 deletions(-) The memevent code currently has a mechanism for reserving space in the ring before putting an event, but each caller must individually ensure that the vCPUs are correctly paused if no space is available. This fixes that issue by reversing the semantics, we ensure that enough space is always left for one event per vCPU in the ring. If this constraint will be violated by the current vCPU putting another request in the ring, we pause it after putting the current event. If no foreign mappings cause mem events, then no mem events will be lost. Because events caused by foreign mapping are effectively unbound, we cannot guarantee with certainty lack of event loss. Additionally, we ensure that no events are lost by Xen as a consumer if multiple responses land on the ring for a single domctl. While the current domctl-based notifications to Xen disallow batching, this is required for later patches. Finally, mark_and_pause_vcpus was misleading, beacause it wasn''t strictly pausing the vcpu''s, rather sending them to sleep. Change to actual vcpu_pause, which protects wakeups via an atomic count. This is useful when an event pauses a vcpu and the vcpu gets mark and paused due to ring congestion. Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 19a5a2cddad3 -r ee909e5a9d85 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4077,7 +4077,6 @@ static int hvm_memory_event_traps(long p struct vcpu* v = current; struct domain *d = v->domain; mem_event_request_t req; - int rc; if ( !(p & HVMPME_MODE_MASK) ) return 0; @@ -4085,10 +4084,6 @@ static int hvm_memory_event_traps(long p if ( (p & HVMPME_onchangeonly) && (value == old) ) return 1; - rc = mem_event_check_ring(d, &d->mem_access); - if ( rc ) - return rc; - memset(&req, 0, sizeof(req)); req.type = MEM_EVENT_TYPE_ACCESS; req.reason = reason; @@ -4108,7 +4103,21 @@ static int hvm_memory_event_traps(long p req.gla_valid = 1; } - mem_event_put_request(d, &d->mem_access, &req); + if ( mem_event_put_request(d, &d->mem_access, &req) ) + { + /* rc == -ENOSYS + * If there was no ring to send the event, then simply unpause the + * vcpu (if we had previously paused it). It will retry the + * instruction (with the exception "handled") and go on + * rc == -EBUSY + * If the ring is full, the vcpu has been marked and paused, + * so we also need to unpause it (if we had previously paused + * it). Once the consumer depletes the ring, the vcpu will be + * woken up and will retry. The consumer may have lost this + * exception, though. */ + if ( req.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) + vcpu_unpause(v); + } return 1; } diff -r 19a5a2cddad3 -r ee909e5a9d85 xen/arch/x86/mm/mem_event.c --- a/xen/arch/x86/mm/mem_event.c +++ b/xen/arch/x86/mm/mem_event.c @@ -91,6 +91,9 @@ static int mem_event_enable(struct domai put_gfn(dom_mem_event, ring_gfn); put_gfn(dom_mem_event, shared_gfn); + /* Set the number of currently blocked vCPUs to 0. */ + med->blocked = 0; + /* Allocate event channel */ rc = alloc_unbound_xen_event_channel(d->vcpu[0], current->domain->domain_id); @@ -108,7 +111,7 @@ static int mem_event_enable(struct domai mem_event_ring_lock_init(med); /* Wake any VCPUs paused for memory events */ - mem_event_unpause_vcpus(d); + mem_event_unpause_vcpus(d, med); return 0; @@ -133,31 +136,83 @@ static int mem_event_disable(struct mem_ return 0; } -void mem_event_put_request(struct domain *d, struct mem_event_domain *med, mem_event_request_t *req) +static inline int mem_event_ring_free(struct domain *d, struct mem_event_domain *med) { + int free_requests; + + free_requests = RING_FREE_REQUESTS(&med->front_ring); + if ( unlikely(free_requests < d->max_vcpus) ) + { + /* This may happen during normal operation (hopefully not often). */ + gdprintk(XENLOG_INFO, "mem_event request slots for domain %d: %d\n", + d->domain_id, free_requests); + } + + return free_requests; +} + +/* Return values + * zero: success + * -ENOSYS: no ring + * -EAGAIN: ring is full and the event has not been transmitted. + * Only foreign vcpu''s get EAGAIN + * -EBUSY: guest vcpu has been paused due to ring congestion + */ +int mem_event_put_request(struct domain *d, struct mem_event_domain *med, mem_event_request_t *req) +{ + int ret = 0; mem_event_front_ring_t *front_ring; RING_IDX req_prod; + if ( mem_event_check_ring(d, med) ) + return -ENOSYS; + mem_event_ring_lock(med); - front_ring = &med->front_ring; - req_prod = front_ring->req_prod_pvt; + if ( mem_event_ring_free(d, med) == 0 ) + { + /* This *may* happen, but only when foreign mappings generate events. */ + gdprintk(XENLOG_WARNING, "possible lost mem_event for domain %d\n", + d->domain_id); + ret = -EAGAIN; + } + else + { + front_ring = &med->front_ring; + req_prod = front_ring->req_prod_pvt; - /* Copy request */ - memcpy(RING_GET_REQUEST(front_ring, req_prod), req, sizeof(*req)); - req_prod++; + /* Copy request */ + memcpy(RING_GET_REQUEST(front_ring, req_prod), req, sizeof(*req)); + req_prod++; - /* Update ring */ - med->req_producers--; - front_ring->req_prod_pvt = req_prod; - RING_PUSH_REQUESTS(front_ring); + /* Update ring */ + front_ring->req_prod_pvt = req_prod; + RING_PUSH_REQUESTS(front_ring); + } + + /* + * We ensure that each vcpu can put at least *one* event -- because some + * events are not repeatable, such as dropping a page. This will ensure no + * vCPU is left with an event that they must place on the ring, but cannot. + * They will be paused after the event is placed. + * See large comment below in mem_event_unpause_vcpus(). + */ + if ( current->domain == d && + mem_event_ring_free(d, med) < d->max_vcpus ) + { + mem_event_mark_and_pause(current, med); + /* For guest vcpu, overwrite EAGAIN */ + ret = -EBUSY; + } mem_event_ring_unlock(med); notify_via_xen_event_channel(d, med->xen_port); + + return ret; } -void mem_event_get_response(struct mem_event_domain *med, mem_event_response_t *rsp) +int mem_event_get_response(struct domain *d, struct mem_event_domain *med, mem_event_response_t *rsp) { mem_event_front_ring_t *front_ring; RING_IDX rsp_cons; @@ -167,6 +222,12 @@ void mem_event_get_response(struct mem_e front_ring = &med->front_ring; rsp_cons = front_ring->rsp_cons; + if ( !RING_HAS_UNCONSUMED_RESPONSES(front_ring) ) + { + mem_event_ring_unlock(med); + return 0; + } + /* Copy response */ memcpy(rsp, RING_GET_RESPONSE(front_ring, rsp_cons), sizeof(*rsp)); rsp_cons++; @@ -176,54 +237,74 @@ void mem_event_get_response(struct mem_e front_ring->sring->rsp_event = rsp_cons + 1; mem_event_ring_unlock(med); + + return 1; } -void mem_event_unpause_vcpus(struct domain *d) +void mem_event_unpause_vcpus(struct domain *d, struct mem_event_domain *med) { struct vcpu *v; + int free, i, j, k; + int online = d->max_vcpus; + if ( !med->blocked ) + return; + + mem_event_ring_lock(med); + free = mem_event_ring_free(d, med); + + /* + * We ensure that we only have vCPUs online if there are enough free slots + * for their memory events to be processed. This will ensure that no + * memory events are lost (due to the fact that certain types of events + * cannot be replayed, we need to ensure that there is space in the ring + * for when they are hit). + * See large comment above in mem_event_put_request(). + */ for_each_vcpu ( d, v ) - if ( test_and_clear_bit(_VPF_mem_event, &v->pause_flags) ) - vcpu_wake(v); + if ( test_bit(_VPF_mem_event, &v->pause_flags) ) + online--; + + /* We remember which vcpu last woke up to avoid scanning always linearly + * from xero and starving higher-numbered vcpus under high load */ + if ( d->vcpu ) + { + for (i = med->last_vcpu_wake_up + 1, j = 0; j < d->max_vcpus; i++, j++) + { + k = i % d->max_vcpus; + v = d->vcpu[k]; + if ( !v ) continue; + + if ( !(med->blocked) || online >= free ) + break; + if ( test_and_clear_bit(_VPF_mem_event, &v->pause_flags) ) + { + vcpu_unpause(v); + online++; + med->blocked--; + med->last_vcpu_wake_up = k; + } + } + } + + mem_event_ring_unlock(med); } -void mem_event_mark_and_pause(struct vcpu *v) +void mem_event_mark_and_pause(struct vcpu *v, struct mem_event_domain *med) { - set_bit(_VPF_mem_event, &v->pause_flags); - vcpu_sleep_nosync(v); -} - -void mem_event_put_req_producers(struct mem_event_domain *med) -{ - mem_event_ring_lock(med); - med->req_producers--; - mem_event_ring_unlock(med); + if ( !test_and_set_bit(_VPF_mem_event, &v->pause_flags) ) + { + vcpu_pause_nosync(v); + med->blocked++; + } } int mem_event_check_ring(struct domain *d, struct mem_event_domain *med) { - struct vcpu *curr = current; - int free_requests; - int ring_full = 1; + if ( !med->ring_page ) + return -ENOSYS; - if ( !med->ring_page ) - return -1; - - mem_event_ring_lock(med); - - free_requests = RING_FREE_REQUESTS(&med->front_ring); - if ( med->req_producers < free_requests ) - { - med->req_producers++; - ring_full = 0; - } - - if ( ring_full && (curr->domain == d) ) - mem_event_mark_and_pause(curr); - - mem_event_ring_unlock(med); - - return ring_full; + return 0; } int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec, diff -r 19a5a2cddad3 -r ee909e5a9d85 xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -281,12 +281,19 @@ static struct page_info* mem_sharing_all vcpu_pause_nosync(v); req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; - if(mem_event_check_ring(d, &d->mem_share)) return page; - req.gfn = gfn; req.p2mt = p2m_ram_shared; req.vcpu_id = v->vcpu_id; - mem_event_put_request(d, &d->mem_share, &req); + + /* If there is no ring, and we''re out of memory, then + * there is no way out. */ + if ( mem_event_put_request(d, &d->mem_share, &req) == -ENOSYS ) + { + gdprintk(XENLOG_ERR, + "Failed alloc on unshare path for %hu and no ring " + "to upcall\n", d->domain_id); + domain_crash(d); + } return page; } @@ -300,12 +307,16 @@ int mem_sharing_sharing_resume(struct do { mem_event_response_t rsp; - /* Get request off the ring */ - mem_event_get_response(&d->mem_share, &rsp); + /* Get all requests off the ring */ + while ( mem_event_get_response(d, &d->mem_share, &rsp) ) + { + /* Unpause domain/vcpu */ + if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) + vcpu_unpause(d->vcpu[rsp.vcpu_id]); + } - /* Unpause domain/vcpu */ - if( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) - vcpu_unpause(d->vcpu[rsp.vcpu_id]); + /* Unpause any domains that were paused because the ring was full */ + mem_event_unpause_vcpus(d, &d->mem_paging); return 0; } diff -r 19a5a2cddad3 -r ee909e5a9d85 xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -877,7 +877,7 @@ void p2m_mem_paging_drop_page(struct dom struct vcpu *v = current; mem_event_request_t req; - /* Check that there''s space on the ring for this request */ + /* Check that there''s a paging listener who cares about this */ if ( mem_event_check_ring(d, &d->mem_paging) == 0) { /* Send release notification to pager */ @@ -886,7 +886,8 @@ void p2m_mem_paging_drop_page(struct dom req.gfn = gfn; req.vcpu_id = v->vcpu_id; - mem_event_put_request(d, &d->mem_paging, &req); + /* If the event is lost, tough bananas */ + (void)mem_event_put_request(d, &d->mem_paging, &req); } } @@ -920,9 +921,14 @@ void p2m_mem_paging_populate(struct doma mfn_t mfn; struct p2m_domain *p2m = p2m_get_hostp2m(d); - /* Check that there''s space on the ring for this request */ + /* We''re paging. There should be a ring */ if ( mem_event_check_ring(d, &d->mem_paging) ) + { + gdprintk(XENLOG_ERR, "Domain %hu paging gfn %lx yet no ring " + "in place\n", d->domain_id, gfn); + domain_crash(d); return; + } memset(&req, 0, sizeof(req)); req.type = MEM_EVENT_TYPE_PAGING; @@ -951,7 +957,6 @@ void p2m_mem_paging_populate(struct doma else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged ) { /* gfn is already on its way back and vcpu is not paused */ - mem_event_put_req_producers(&d->mem_paging); return; } @@ -960,7 +965,14 @@ void p2m_mem_paging_populate(struct doma req.p2mt = p2mt; req.vcpu_id = v->vcpu_id; - mem_event_put_request(d, &d->mem_paging, &req); + if ( mem_event_put_request(d, &d->mem_paging, &req) == -EBUSY) + { + /* The ring is full, so we unpause the vcpu (if we had paused + * it), and let it retry, once the consumer has made some space + * in the ring. Checks when pausing ensure this is a guest vcpu */ + if ( req.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) + vcpu_unpause(v); + } } /** @@ -1038,31 +1050,32 @@ void p2m_mem_paging_resume(struct domain p2m_access_t a; mfn_t mfn; - /* Pull the response off the ring */ - mem_event_get_response(&d->mem_paging, &rsp); + /* Pull all responses off the ring */ + while( mem_event_get_response(d, &d->mem_paging, &rsp) ) + { + /* Fix p2m entry if the page was not dropped */ + if ( !(rsp.flags & MEM_EVENT_FLAG_DROP_PAGE) ) + { + p2m_lock(p2m); + mfn = p2m->get_entry(p2m, rsp.gfn, &p2mt, &a, p2m_query, NULL); + /* Allow only pages which were prepared properly, or pages which + * were nominated but not evicted */ + if ( mfn_valid(mfn) && + (p2mt == p2m_ram_paging_in || p2mt == p2m_ram_paging_in_start) ) + { + set_p2m_entry(p2m, rsp.gfn, mfn, PAGE_ORDER_4K, p2m_ram_rw, a); + set_gpfn_from_mfn(mfn_x(mfn), rsp.gfn); + } + p2m_unlock(p2m); + } - /* Fix p2m entry if the page was not dropped */ - if ( !(rsp.flags & MEM_EVENT_FLAG_DROP_PAGE) ) - { - p2m_lock(p2m); - mfn = p2m->get_entry(p2m, rsp.gfn, &p2mt, &a, p2m_query, NULL); - /* Allow only pages which were prepared properly, or pages which - * were nominated but not evicted */ - if ( mfn_valid(mfn) && - (p2mt == p2m_ram_paging_in || p2mt == p2m_ram_paging_in_start) ) - { - set_p2m_entry(p2m, rsp.gfn, mfn, PAGE_ORDER_4K, p2m_ram_rw, a); - set_gpfn_from_mfn(mfn_x(mfn), rsp.gfn); - } - p2m_unlock(p2m); + /* Unpause domain */ + if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) + vcpu_unpause(d->vcpu[rsp.vcpu_id]); } - /* Unpause domain */ - if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) - vcpu_unpause(d->vcpu[rsp.vcpu_id]); - /* Unpause any domains that were paused because the ring was full */ - mem_event_unpause_vcpus(d); + mem_event_unpause_vcpus(d, &d->mem_paging); } void p2m_mem_access_check(unsigned long gpa, bool_t gla_valid, unsigned long gla, @@ -1073,7 +1086,6 @@ void p2m_mem_access_check(unsigned long unsigned long gfn = gpa >> PAGE_SHIFT; struct domain *d = v->domain; struct p2m_domain* p2m = p2m_get_hostp2m(d); - int res; mfn_t mfn; p2m_type_t p2mt; p2m_access_t p2ma; @@ -1091,17 +1103,14 @@ void p2m_mem_access_check(unsigned long p2m_unlock(p2m); /* Otherwise, check if there is a memory event listener, and send the message along */ - res = mem_event_check_ring(d, &d->mem_access); - if ( res < 0 ) + if ( mem_event_check_ring(d, &d->mem_access) == -ENOSYS ) { /* No listener */ if ( p2m->access_required ) { - printk(XENLOG_INFO - "Memory access permissions failure, no mem_event listener: pausing VCPU %d, dom %d\n", - v->vcpu_id, d->domain_id); - - mem_event_mark_and_pause(v); + gdprintk(XENLOG_INFO, "Memory access permissions failure, no mem_event " + "listener VCPU %d, dom %d\n", v->vcpu_id, d->domain_id); + domain_crash(v->domain); } else { @@ -1113,8 +1122,6 @@ void p2m_mem_access_check(unsigned long return; } - else if ( res > 0 ) - return; /* No space in buffer; VCPU paused */ memset(&req, 0, sizeof(req)); req.type = MEM_EVENT_TYPE_ACCESS; @@ -1135,9 +1142,12 @@ void p2m_mem_access_check(unsigned long req.vcpu_id = v->vcpu_id; - mem_event_put_request(d, &d->mem_access, &req); + /* If the ring was busy, this guest vcpu got marked and paused (again), so + * we can remove our unconditional vcpu pause. */ + if ( mem_event_put_request(d, &d->mem_access, &req) == -EBUSY ) + vcpu_unpause(v); - /* VCPU paused, mem event request sent */ + /* VCPU paused */ } void p2m_mem_access_resume(struct p2m_domain *p2m) @@ -1145,15 +1155,17 @@ void p2m_mem_access_resume(struct p2m_do struct domain *d = p2m->domain; mem_event_response_t rsp; - mem_event_get_response(&d->mem_access, &rsp); - - /* Unpause domain */ - if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) - vcpu_unpause(d->vcpu[rsp.vcpu_id]); + /* Pull all responses off the ring */ + while( mem_event_get_response(d, &d->mem_access, &rsp) ) + { + /* Unpause domain */ + if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) + vcpu_unpause(d->vcpu[rsp.vcpu_id]); + } /* Unpause any domains that were paused because the ring was full or no listener * was available */ - mem_event_unpause_vcpus(d); + mem_event_unpause_vcpus(d, &d->mem_access); } diff -r 19a5a2cddad3 -r ee909e5a9d85 xen/include/asm-x86/mem_event.h --- a/xen/include/asm-x86/mem_event.h +++ b/xen/include/asm-x86/mem_event.h @@ -25,12 +25,18 @@ #define __MEM_EVENT_H__ /* Pauses VCPU while marking pause flag for mem event */ -void mem_event_mark_and_pause(struct vcpu *v); +void mem_event_mark_and_pause(struct vcpu *v, struct mem_event_domain *med); int mem_event_check_ring(struct domain *d, struct mem_event_domain *med); -void mem_event_put_req_producers(struct mem_event_domain *med); -void mem_event_put_request(struct domain *d, struct mem_event_domain *med, mem_event_request_t *req); -void mem_event_get_response(struct mem_event_domain *med, mem_event_response_t *rsp); -void mem_event_unpause_vcpus(struct domain *d); +void mem_event_unpause_vcpus(struct domain *d, struct mem_event_domain *med); + +/* Returns 0 on success, -ENOSYS if there is no ring, -EBUSY if there is + * a ring or no space. FOr success or -EBUSY. the vCPU is left blocked to + * ensure that the ring does not lose events. In general, put_request should + * not fail, as it attempts to ensure that each vCPU can put at least one req. */ +int mem_event_put_request(struct domain *d, struct mem_event_domain *med, + mem_event_request_t *req); +int mem_event_get_response(struct domain *d, struct mem_event_domain *med, + mem_event_response_t *rsp); int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec, XEN_GUEST_HANDLE(void) u_domctl); diff -r 19a5a2cddad3 -r ee909e5a9d85 xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -183,13 +183,16 @@ struct mem_event_domain { /* ring lock */ spinlock_t ring_lock; - unsigned int req_producers; /* shared page */ mem_event_shared_page_t *shared_page; /* shared ring page */ void *ring_page; /* front-end ring */ mem_event_front_ring_t front_ring; + /* the number of vCPUs blocked */ + unsigned int blocked; + /* The last vcpu woken up */ + unsigned int last_vcpu_wake_up; /* event channel port (vcpu0 only) */ int xen_port; }; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-14 21:58 UTC
[Xen-devel] [PATCH 2 of 4] Create a generic callback mechanism for Xen-bound event channels
xen/arch/ia64/vmx/vmx_init.c | 2 +- xen/arch/x86/hvm/hvm.c | 5 +- xen/arch/x86/mm/mem_event.c | 3 +- xen/common/event_channel.c | 75 ++++++++++++++++++++++++++++++++++--------- xen/include/xen/event.h | 5 ++- xen/include/xen/sched.h | 2 +- 6 files changed, 69 insertions(+), 23 deletions(-) For event channels for which Xen is the consumer, there currently is a single action. With this patch, we allow event channel creators to specify a generic callback (or no callback). Because the expectation is that there will be few callbacks, they are stored in a small table. Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Keir Fraser <keir@xen.org> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r ee909e5a9d85 -r b3cdfb5b76d0 xen/arch/ia64/vmx/vmx_init.c --- a/xen/arch/ia64/vmx/vmx_init.c +++ b/xen/arch/ia64/vmx/vmx_init.c @@ -377,7 +377,7 @@ vmx_vcpu_initialise(struct vcpu *v) { struct vmx_ioreq_page *iorp = &v->domain->arch.hvm_domain.ioreq; - int rc = alloc_unbound_xen_event_channel(v, 0); + int rc = alloc_unbound_xen_event_channel(v, 0, NULL); if (rc < 0) return rc; v->arch.arch_vmx.xen_port = rc; diff -r ee909e5a9d85 -r b3cdfb5b76d0 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -979,7 +979,7 @@ int hvm_vcpu_initialise(struct vcpu *v) goto fail3; /* Create ioreq event channel. */ - rc = alloc_unbound_xen_event_channel(v, 0); + rc = alloc_unbound_xen_event_channel(v, 0, NULL); if ( rc < 0 ) goto fail4; @@ -3531,7 +3531,8 @@ long do_hvm_op(unsigned long op, XEN_GUE for_each_vcpu ( d, v ) { int old_port, new_port; - new_port = alloc_unbound_xen_event_channel(v, a.value); + new_port = alloc_unbound_xen_event_channel( + v, a.value, NULL); if ( new_port < 0 ) { rc = new_port; diff -r ee909e5a9d85 -r b3cdfb5b76d0 xen/arch/x86/mm/mem_event.c --- a/xen/arch/x86/mm/mem_event.c +++ b/xen/arch/x86/mm/mem_event.c @@ -96,7 +96,8 @@ static int mem_event_enable(struct domai /* Allocate event channel */ rc = alloc_unbound_xen_event_channel(d->vcpu[0], - current->domain->domain_id); + current->domain->domain_id, + NULL); if ( rc < 0 ) goto err; diff -r ee909e5a9d85 -r b3cdfb5b76d0 xen/common/event_channel.c --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -57,6 +57,51 @@ goto out; \ } while ( 0 ) +#define consumer_is_xen(e) (!!(e)->xen_consumer) + +/* + * The function alloc_unbound_xen_event_channel() allows an arbitrary + * notifier function to be specified. However, very few unique functions + * are specified in practice, so to prevent bloating the evtchn structure + * with a pointer, we stash them dynamically in a small lookup array which + * can be indexed by a small integer. + */ +static xen_event_channel_notification_t xen_consumers[8]; + +/* Default notification action: wake up from wait_on_xen_event_channel(). */ +static void default_xen_notification_fn(struct vcpu *v, unsigned int port) +{ + /* Consumer needs notification only if blocked. */ + if ( test_and_clear_bit(_VPF_blocked_in_xen, &v->pause_flags) ) + vcpu_wake(v); +} + +/* + * Given a notification function, return the value to stash in + * the evtchn->xen_consumer field. + */ +static uint8_t get_xen_consumer(xen_event_channel_notification_t fn) +{ + unsigned int i; + + if ( fn == NULL ) + fn = default_xen_notification_fn; + + for ( i = 0; i < ARRAY_SIZE(xen_consumers); i++ ) + { + if ( xen_consumers[i] == NULL ) + xen_consumers[i] = fn; + if ( xen_consumers[i] == fn ) + break; + } + + BUG_ON(i >= ARRAY_SIZE(xen_consumers)); + return i+1; +} + +/* Get the notification function for a given Xen-bound event channel. */ +#define xen_notification_fn(e) (xen_consumers[(e)->xen_consumer-1]) + static int evtchn_set_pending(struct vcpu *v, int port); static int virq_is_global(int virq) @@ -395,7 +440,7 @@ static long __evtchn_close(struct domain chn1 = evtchn_from_port(d1, port1); /* Guest cannot close a Xen-attached event channel. */ - if ( unlikely(chn1->consumer_is_xen) ) + if ( unlikely(consumer_is_xen(chn1)) ) { rc = -EINVAL; goto out; @@ -533,7 +578,7 @@ int evtchn_send(struct domain *d, unsign lchn = evtchn_from_port(ld, lport); /* Guest cannot send via a Xen-attached event channel. */ - if ( unlikely(lchn->consumer_is_xen) ) + if ( unlikely(consumer_is_xen(lchn)) ) { spin_unlock(&ld->event_lock); return -EINVAL; @@ -550,13 +595,8 @@ int evtchn_send(struct domain *d, unsign rport = lchn->u.interdomain.remote_port; rchn = evtchn_from_port(rd, rport); rvcpu = rd->vcpu[rchn->notify_vcpu_id]; - if ( rchn->consumer_is_xen ) - { - /* Xen consumers need notification only if they are blocked. */ - if ( test_and_clear_bit(_VPF_blocked_in_xen, - &rvcpu->pause_flags) ) - vcpu_wake(rvcpu); - } + if ( consumer_is_xen(rchn) ) + (*xen_notification_fn(rchn))(rvcpu, rport); else { evtchn_set_pending(rvcpu, rport); @@ -783,7 +823,7 @@ long evtchn_bind_vcpu(unsigned int port, chn = evtchn_from_port(d, port); /* Guest cannot re-bind a Xen-attached event channel. */ - if ( unlikely(chn->consumer_is_xen) ) + if ( unlikely(consumer_is_xen(chn)) ) { rc = -EINVAL; goto out; @@ -994,7 +1034,8 @@ long do_event_channel_op(int cmd, XEN_GU int alloc_unbound_xen_event_channel( - struct vcpu *local_vcpu, domid_t remote_domid) + struct vcpu *local_vcpu, domid_t remote_domid, + xen_event_channel_notification_t notification_fn) { struct evtchn *chn; struct domain *d = local_vcpu->domain; @@ -1007,7 +1048,7 @@ int alloc_unbound_xen_event_channel( chn = evtchn_from_port(d, port); chn->state = ECS_UNBOUND; - chn->consumer_is_xen = 1; + chn->xen_consumer = get_xen_consumer(notification_fn); chn->notify_vcpu_id = local_vcpu->vcpu_id; chn->u.unbound.remote_domid = remote_domid; @@ -1034,8 +1075,8 @@ void free_xen_event_channel( BUG_ON(!port_is_valid(d, port)); chn = evtchn_from_port(d, port); - BUG_ON(!chn->consumer_is_xen); - chn->consumer_is_xen = 0; + BUG_ON(!consumer_is_xen(chn)); + chn->xen_consumer = 0; spin_unlock(&d->event_lock); @@ -1059,7 +1100,7 @@ void notify_via_xen_event_channel(struct ASSERT(port_is_valid(ld, lport)); lchn = evtchn_from_port(ld, lport); - ASSERT(lchn->consumer_is_xen); + ASSERT(consumer_is_xen(lchn)); if ( likely(lchn->state == ECS_INTERDOMAIN) ) { @@ -1102,7 +1143,7 @@ void evtchn_destroy(struct domain *d) /* Close all existing event channels. */ for ( i = 0; port_is_valid(d, i); i++ ) { - evtchn_from_port(d, i)->consumer_is_xen = 0; + evtchn_from_port(d, i)->xen_consumer = 0; (void)__evtchn_close(d, i); } @@ -1188,7 +1229,7 @@ static void domain_dump_evtchn_info(stru printk(" v=%d", chn->u.virq); break; } - printk(" x=%d\n", chn->consumer_is_xen); + printk(" x=%d\n", chn->xen_consumer); } spin_unlock(&d->event_lock); diff -r ee909e5a9d85 -r b3cdfb5b76d0 xen/include/xen/event.h --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -51,8 +51,11 @@ int evtchn_unmask(unsigned int port); void evtchn_move_pirqs(struct vcpu *v); /* Allocate/free a Xen-attached event channel port. */ +typedef void (*xen_event_channel_notification_t)( + struct vcpu *v, unsigned int port); int alloc_unbound_xen_event_channel( - struct vcpu *local_vcpu, domid_t remote_domid); + struct vcpu *local_vcpu, domid_t remote_domid, + xen_event_channel_notification_t notification_fn); void free_xen_event_channel( struct vcpu *local_vcpu, int port); diff -r ee909e5a9d85 -r b3cdfb5b76d0 xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -47,7 +47,7 @@ struct evtchn #define ECS_VIRQ 5 /* Channel is bound to a virtual IRQ line. */ #define ECS_IPI 6 /* Channel is bound to a virtual IPI line. */ u8 state; /* ECS_* */ - u8 consumer_is_xen; /* Consumed by Xen or by guest? */ + u8 xen_consumer; /* Consumer in Xen, if any? (0 = send to guest) */ u16 notify_vcpu_id; /* VCPU for local delivery notification */ union { struct { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-14 21:58 UTC
[Xen-devel] [PATCH 3 of 4] Make the prototype of p2m_mem_access_resume consistent
xen/arch/x86/mm/mem_access.c | 3 +-- xen/arch/x86/mm/p2m.c | 3 +-- xen/include/asm-x86/p2m.h | 2 +- 3 files changed, 3 insertions(+), 5 deletions(-) Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Keir Fraser <keir@xen.org> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r b3cdfb5b76d0 -r 106296812c3f xen/arch/x86/mm/mem_access.c --- a/xen/arch/x86/mm/mem_access.c +++ b/xen/arch/x86/mm/mem_access.c @@ -29,13 +29,12 @@ int mem_access_domctl(struct domain *d, XEN_GUEST_HANDLE(void) u_domctl) { int rc; - struct p2m_domain *p2m = p2m_get_hostp2m(d); switch( mec->op ) { case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_RESUME: { - p2m_mem_access_resume(p2m); + p2m_mem_access_resume(d); rc = 0; } break; diff -r b3cdfb5b76d0 -r 106296812c3f xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1150,9 +1150,8 @@ void p2m_mem_access_check(unsigned long /* VCPU paused */ } -void p2m_mem_access_resume(struct p2m_domain *p2m) +void p2m_mem_access_resume(struct domain *d) { - struct domain *d = p2m->domain; mem_event_response_t rsp; /* Pull all responses off the ring */ diff -r b3cdfb5b76d0 -r 106296812c3f xen/include/asm-x86/p2m.h --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -493,7 +493,7 @@ static inline void p2m_mem_paging_popula void p2m_mem_access_check(unsigned long gpa, bool_t gla_valid, unsigned long gla, bool_t access_r, bool_t access_w, bool_t access_x); /* Resumes the running of the VCPU, restarting the last instruction */ -void p2m_mem_access_resume(struct p2m_domain *p2m); +void p2m_mem_access_resume(struct domain *d); /* Set access type for a region of pfns. * If start_pfn == -1ul, sets the default access type */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-14 21:58 UTC
[Xen-devel] [PATCH 4 of 4] Allow memevent responses to be signaled via the event channel
xen/arch/x86/mm/mem_event.c | 26 ++++++++++++++++++++------ 1 files changed, 20 insertions(+), 6 deletions(-) Don''t require a separate domctl to notify the memevent interface that an event has occured. This domctl can be taxing, particularly when you are scaling events and paging to many domains across a single system. Instead, we use the existing event channel to signal when we place something in the ring (as per normal ring operation). Signed-off-by: Adin Scannell <adin@scannell.ca> Signed-off-by: Keir Fraser <keir@xen.org> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 106296812c3f -r 08a6ea63be98 xen/arch/x86/mm/mem_event.c --- a/xen/arch/x86/mm/mem_event.c +++ b/xen/arch/x86/mm/mem_event.c @@ -37,9 +37,11 @@ #define mem_event_ring_lock(_med) spin_lock(&(_med)->ring_lock) #define mem_event_ring_unlock(_med) spin_unlock(&(_med)->ring_lock) -static int mem_event_enable(struct domain *d, - xen_domctl_mem_event_op_t *mec, - struct mem_event_domain *med) +static int mem_event_enable( + struct domain *d, + xen_domctl_mem_event_op_t *mec, + struct mem_event_domain *med, + xen_event_channel_notification_t notification_fn) { int rc; struct domain *dom_mem_event = current->domain; @@ -97,7 +99,7 @@ static int mem_event_enable(struct domai /* Allocate event channel */ rc = alloc_unbound_xen_event_channel(d->vcpu[0], current->domain->domain_id, - NULL); + notification_fn); if ( rc < 0 ) goto err; @@ -308,6 +310,18 @@ int mem_event_check_ring(struct domain * return 0; } +/* Registered with Xen-bound event channel for incoming notifications. */ +static void mem_paging_notification(struct vcpu *v, unsigned int port) +{ + p2m_mem_paging_resume(v->domain); +} + +/* Registered with Xen-bound event channel for incoming notifications. */ +static void mem_access_notification(struct vcpu *v, unsigned int port) +{ + p2m_mem_access_resume(v->domain); +} + int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec, XEN_GUEST_HANDLE(void) u_domctl) { @@ -369,7 +383,7 @@ int mem_event_domctl(struct domain *d, x if ( p2m->pod.entry_count ) break; - rc = mem_event_enable(d, mec, med); + rc = mem_event_enable(d, mec, med, mem_paging_notification); } break; @@ -408,7 +422,7 @@ int mem_event_domctl(struct domain *d, x if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) break; - rc = mem_event_enable(d, mec, med); + rc = mem_event_enable(d, mec, med, mem_access_notification); } break; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-23 18:35 UTC
Re: [PATCH 1 of 4] Improve ring management for memory events
On Mon, Nov 14, Andres Lagar-Cavilla wrote: I will do a more complete review later, a few comments below.> + free_requests = RING_FREE_REQUESTS(&med->front_ring); > + if ( unlikely(free_requests < d->max_vcpus) ) > + { > + /* This may happen during normal operation (hopefully not often). */ > + gdprintk(XENLOG_INFO, "mem_event request slots for domain %d: %d\n", > + d->domain_id, free_requests);What is the purpose of this printk? The ring can and does fillup quickly with paging.> + if ( mem_event_ring_free(d, med) == 0 ) > + { > + /* This *may* happen, but only when foreign mappings generate events. */ > + gdprintk(XENLOG_WARNING, "possible lost mem_event for domain %d\n", > + d->domain_id);This printk is not needed, callers have to try again. Olaf
Andres Lagar-Cavilla
2011-Nov-23 18:52 UTC
Re: [PATCH 1 of 4] Improve ring management for memory events
Olaf,> On Mon, Nov 14, Andres Lagar-Cavilla wrote: > > I will do a more complete review later, a few comments below. > > >> + free_requests = RING_FREE_REQUESTS(&med->front_ring); >> + if ( unlikely(free_requests < d->max_vcpus) ) >> + { >> + /* This may happen during normal operation (hopefully not >> often). */ >> + gdprintk(XENLOG_INFO, "mem_event request slots for domain %d: >> %d\n", >> + d->domain_id, free_requests); > > What is the purpose of this printk? The ring can and does fillup quickly > with paging.Well, we can tone down printk''s to be debug level. I don''t think they''re unnecessary if they''re made an optional debug tool. Question: I have one vcpu, how do I fill up the ring quickly? (outside of foreign mappings) Andres> > >> + if ( mem_event_ring_free(d, med) == 0 ) >> + { >> + /* This *may* happen, but only when foreign mappings generate >> events. */ >> + gdprintk(XENLOG_WARNING, "possible lost mem_event for domain >> %d\n", >> + d->domain_id); > > This printk is not needed, callers have to try again. > > Olaf >
Olaf Hering
2011-Nov-23 18:57 UTC
Re: [PATCH 1 of 4] Improve ring management for memory events
On Wed, Nov 23, Andres Lagar-Cavilla wrote:> Well, we can tone down printk''s to be debug level. I don''t think they''re > unnecessary if they''re made an optional debug tool.There is nothing to debug here, since the callers have to retry anyway.> Question: I have one vcpu, how do I fill up the ring quickly? (outside of > foreign mappings)Have a balloon driver in the guest and balloon down more than 64*PAGE_SIZE. This is the default at least in my setup where the kernel driver releases some memory right away (I havent checked where this is actually configured). Olaf
Andres Lagar-Cavilla
2011-Nov-24 18:54 UTC
Re: [PATCH 1 of 4] Improve ring management for memory events
> On Wed, Nov 23, Andres Lagar-Cavilla wrote: > >> Well, we can tone down printk''s to be debug level. I don''t think they''re >> unnecessary if they''re made an optional debug tool. > > There is nothing to debug here, since the callers have to retry anyway. > >> Question: I have one vcpu, how do I fill up the ring quickly? (outside >> of >> foreign mappings) > > Have a balloon driver in the guest and balloon down more than > 64*PAGE_SIZE. This is the default at least in my setup where the kernel > driver releases some memory right away (I havent checked where this is > actually configured).I see, a guest can call decrease_reservation with an extent_order large enough that it will overflow the ring. No matter the size of the ring. Isn''t preemption of this hypercall a better tactic than putting the vcpu on a wait-queue? This won''t preclude the need for wait queues, but it feels like a much cleaner solution. With retrying of foreign mappings in xc_map_foreign_bulk (and grants), I wonder if we should put events in the ring due to foreign mappings *at all*, in the case of congestion. Eventually a retry will get to kick the pager. Andres> > Olaf >
Olaf Hering
2011-Nov-24 19:23 UTC
Re: [PATCH 1 of 4] Improve ring management for memory events
On Thu, Nov 24, Andres Lagar-Cavilla wrote:> > On Wed, Nov 23, Andres Lagar-Cavilla wrote: > > > >> Well, we can tone down printk''s to be debug level. I don''t think they''re > >> unnecessary if they''re made an optional debug tool. > > > > There is nothing to debug here, since the callers have to retry anyway. > > > >> Question: I have one vcpu, how do I fill up the ring quickly? (outside > >> of > >> foreign mappings) > > > > Have a balloon driver in the guest and balloon down more than > > 64*PAGE_SIZE. This is the default at least in my setup where the kernel > > driver releases some memory right away (I havent checked where this is > > actually configured). > > I see, a guest can call decrease_reservation with an extent_order large > enough that it will overflow the ring. No matter the size of the ring. > Isn''t preemption of this hypercall a better tactic than putting the vcpu > on a wait-queue? This won''t preclude the need for wait queues, but it > feels like a much cleaner solution.Yes, yesterday I was thinking about this as well. p2m_mem_paging_drop_page() should return -EBUSY. But currently not all callers of guest_remove_page() look at the exit code. Perhaps that can be fixed.> With retrying of foreign mappings in xc_map_foreign_bulk (and grants), I > wonder if we should put events in the ring due to foreign mappings *at > all*, in the case of congestion. Eventually a retry will get to kick the > pager.What do you mean with that? Olaf
Andres Lagar-Cavilla
2011-Nov-24 19:35 UTC
Re: [PATCH 1 of 4] Improve ring management for memory events
> On Thu, Nov 24, Andres Lagar-Cavilla wrote: > >> > On Wed, Nov 23, Andres Lagar-Cavilla wrote: >> > >> >> Well, we can tone down printk''s to be debug level. I don''t think >> they''re >> >> unnecessary if they''re made an optional debug tool. >> > >> > There is nothing to debug here, since the callers have to retry >> anyway. >> > >> >> Question: I have one vcpu, how do I fill up the ring quickly? >> (outside >> >> of >> >> foreign mappings) >> > >> > Have a balloon driver in the guest and balloon down more than >> > 64*PAGE_SIZE. This is the default at least in my setup where the >> kernel >> > driver releases some memory right away (I havent checked where this is >> > actually configured). >> >> I see, a guest can call decrease_reservation with an extent_order large >> enough that it will overflow the ring. No matter the size of the ring. >> Isn''t preemption of this hypercall a better tactic than putting the vcpu >> on a wait-queue? This won''t preclude the need for wait queues, but it >> feels like a much cleaner solution. > > Yes, yesterday I was thinking about this as well. > p2m_mem_paging_drop_page() should return -EBUSY. But currently not all > callers of guest_remove_page() look at the exit code. Perhaps that can > be fixed. > >> With retrying of foreign mappings in xc_map_foreign_bulk (and grants), I >> wonder if we should put events in the ring due to foreign mappings *at >> all*, in the case of congestion. Eventually a retry will get to kick the >> pager. > > What do you mean with that?Thinking out loud here. In our ring management patch, we put a guest vcpu to sleep if space in the ring < d->max_vcpus. In the case of a foreign mapping vcpu, we still allow the vcpu to put an event in the ring, as long as there is any space. This can eventually fill up the ring and prevent guest vcpus from placing events. However, we could prevent this latter behaviour if it will result in a condition in which space_in_the_ring < (d->max_vcpus - ring->blocked) This will ensure no event caused by a guest vcpu will ever be lost (we still need the preemption of decrease_reservation, though). Correctness is preserved for the foreign mapping vcpu. It will retry its mapping and eventually there will be space in the ring. With this, we won''t need wait-queues for ring management. Makes sense? Andres> > Olaf >