Andres Lagar-Cavilla
2011-Nov-23 04:58 UTC
Re: RFC: mem_event: use wait queue when ring is full
Olaf, two questions here - do you have any insight for events caused by foreign mappings? Those will be lost with a full ring, with or without wait queues - we have posted a patch (twice) previously, with changes to ring management, most importantly sending guest vcpus to sleep when space in the ring is < d->max_vcpus. I see these two patches as complementary. What is your take? Thanks, Andres> Date: Tue, 22 Nov 2011 22:13:24 +0100 > From: Olaf Hering <olaf@aepfle.de> > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] [PATCH 2 of 3] RFC: mem_event: use wait queue > when ring is full > Message-ID: <de6860cb9205b68d1287.1321996404@probook.site> > Content-Type: text/plain; charset="us-ascii" > > # HG changeset patch > # User Olaf Hering <olaf@aepfle.de> > # Date 1321996145 -3600 > # Node ID de6860cb9205b68d1287482288d1b7b9d0255609 > # Parent d347a8a36d2e7951f98a3d22866dce004484d95f > RFC: mem_event: use wait queue when ring is full > > CAUTION: while the patch by itself is supposed to be complete, > the added usage of waitqueues will lead to sudden host reboots! > > > This change is based on an idea/patch from Adin Scannell. > > If the ring is full, put the current vcpu to sleep if it belongs to the > target domain. The wakeup happens in the p2m_*_resume functions. > A request from another domain will use the existing ->req_producers > functionality because sleeping is not possible if the vcpu does not > belong to the target domain. > > This change fixes also a bug in p2m_mem_paging_drop_page(). Up to now a > full ring will lead to harmless inconsistency in the pager, > > v2: > - p2m_mem_paging_populate: move vcpu_pause after put_request, otherwise > the > vcpu will not wake_up after a wait_event because the pause_count was > increased twice. Fixes guest hangs. > - update free space check in _mem_event_put_request() > - simplify mem_event_put_request() > > Signed-off-by: Olaf Hering <olaf@aepfle.de> > > diff -r d347a8a36d2e -r de6860cb9205 xen/arch/x86/hvm/hvm.c > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -4099,7 +4099,7 @@ static int hvm_memory_event_traps(long p > return 1; > > rc = mem_event_check_ring(d, &d->mem_event->mem_access); > - if ( rc ) > + if ( rc < 0 ) > return rc; > > memset(&req, 0, sizeof(req)); > diff -r d347a8a36d2e -r de6860cb9205 xen/arch/x86/mm/mem_event.c > --- a/xen/arch/x86/mm/mem_event.c > +++ b/xen/arch/x86/mm/mem_event.c > @@ -23,6 +23,7 @@ > > #include <asm/domain.h> > #include <xen/event.h> > +#include <xen/wait.h> > #include <asm/p2m.h> > #include <asm/mem_event.h> > #include <asm/mem_paging.h> > @@ -39,6 +40,7 @@ > > static int mem_event_enable(struct domain *d, > xen_domctl_mem_event_op_t *mec, > + int mem_event_bit, > struct mem_event_domain *med) > { > int rc; > @@ -107,8 +109,12 @@ static int mem_event_enable(struct domai > > mem_event_ring_lock_init(med); > > + med->mem_event_bit = mem_event_bit; > + > + init_waitqueue_head(&med->wq); > + > /* Wake any VCPUs paused for memory events */ > - mem_event_unpause_vcpus(d); > + mem_event_unpause_vcpus(d, med); > > return 0; > > @@ -124,6 +130,9 @@ static int mem_event_enable(struct domai > > static int mem_event_disable(struct mem_event_domain *med) > { > + if (!list_empty(&med->wq.list)) > + return -EBUSY; > + > unmap_domain_page(med->ring_page); > med->ring_page = NULL; > > @@ -133,13 +142,21 @@ static int mem_event_disable(struct mem_ > return 0; > } > > -void mem_event_put_request(struct domain *d, struct mem_event_domain > *med, mem_event_request_t *req) > +static int _mem_event_put_request(struct domain *d, struct > mem_event_domain *med, mem_event_request_t *req) > { > mem_event_front_ring_t *front_ring; > + int free_requests; > RING_IDX req_prod; > > mem_event_ring_lock(med); > > + free_requests = RING_FREE_REQUESTS(&med->front_ring); > + /* A request was eventually claimed in mem_event_check_ring() */ > + if ((current->domain == d && free_requests <= med->req_producers) || > !free_requests) { > + mem_event_ring_unlock(med); > + return 0; > + } > + > front_ring = &med->front_ring; > req_prod = front_ring->req_prod_pvt; > > @@ -155,6 +172,20 @@ void mem_event_put_request(struct domain > mem_event_ring_unlock(med); > > notify_via_xen_event_channel(d, med->xen_port); > + > + return 1; > +} > + > +void mem_event_put_request(struct domain *d, struct mem_event_domain > *med, mem_event_request_t *req) > +{ > + /* Go to sleep if request came from guest */ > + if (current->domain == d) { > + wait_event(med->wq, _mem_event_put_request(d, med, req)); > + return; > + } > + /* Ring was full anyway, unable to sleep in non-guest context */ > + if (!_mem_event_put_request(d, med, req)) > + printk("Failed to put memreq: d %u t %x f %x gfn %lx\n", > d->domain_id, req->type, req->flags, (unsigned long)req->gfn); > } > > void mem_event_get_response(struct mem_event_domain *med, > mem_event_response_t *rsp) > @@ -178,21 +209,27 @@ void mem_event_get_response(struct mem_e > mem_event_ring_unlock(med); > } > > -void mem_event_unpause_vcpus(struct domain *d) > +void mem_event_unpause_vcpus(struct domain *d, struct mem_event_domain > *med) > { > struct vcpu *v; > > for_each_vcpu ( d, v ) > - if ( test_and_clear_bit(_VPF_mem_event, &v->pause_flags) ) > + if ( test_and_clear_bit(med->mem_event_bit, &v->pause_flags) ) > vcpu_wake(v); > } > > -void mem_event_mark_and_pause(struct vcpu *v) > +void mem_event_mark_and_pause(struct vcpu *v, struct mem_event_domain > *med) > { > - set_bit(_VPF_mem_event, &v->pause_flags); > + set_bit(med->mem_event_bit, &v->pause_flags); > vcpu_sleep_nosync(v); > } > > +/** > + * mem_event_put_req_producers - Release a claimed slot > + * @med: mem_event ring > + * > + * mem_event_put_req_producers() releases a claimed slot in the mem_event > ring. > + */ > void mem_event_put_req_producers(struct mem_event_domain *med) > { > mem_event_ring_lock(med); > @@ -200,9 +237,26 @@ void mem_event_put_req_producers(struct > mem_event_ring_unlock(med); > } > > +/** > + * mem_event_check_ring - Check state of a mem_event ring > + * @d: guest domain > + * @med: mem_event ring > + * > + * Return codes: < 0: the ring is not yet configured > + * 0: the ring has some room > + * > 0: the ring is full > + * > + * mem_event_check_ring() checks the state of the given mem_event ring. > + * If the current vcpu belongs to the guest domain, the function assumes > that > + * mem_event_put_request() will sleep until the ring has room again. > + * > + * If the current vcpu does not belong to the target domain the caller > must try > + * again until there is room. A slot is claimed and the caller can place > a > + * request. If the caller does not need to send a request, the claimed > slot has > + * to be released with mem_event_put_req_producers(). > + */ > int mem_event_check_ring(struct domain *d, struct mem_event_domain *med) > { > - struct vcpu *curr = current; > int free_requests; > int ring_full = 1; > > @@ -218,8 +272,8 @@ int mem_event_check_ring(struct domain * > ring_full = 0; > } > > - if ( ring_full && (curr->domain == d) ) > - mem_event_mark_and_pause(curr); > + if ( current->domain == d ) > + ring_full = 0; > > mem_event_ring_unlock(med); > > @@ -287,7 +341,7 @@ int mem_event_domctl(struct domain *d, x > if ( p2m->pod.entry_count ) > break; > > - rc = mem_event_enable(d, mec, med); > + rc = mem_event_enable(d, mec, _VPF_me_mem_paging, med); > } > break; > > @@ -326,7 +380,7 @@ int mem_event_domctl(struct domain *d, x > if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) > break; > > - rc = mem_event_enable(d, mec, med); > + rc = mem_event_enable(d, mec, _VPF_me_mem_access, med); > } > break; > > diff -r d347a8a36d2e -r de6860cb9205 xen/arch/x86/mm/mem_sharing.c > --- a/xen/arch/x86/mm/mem_sharing.c > +++ b/xen/arch/x86/mm/mem_sharing.c > @@ -253,19 +253,11 @@ static void mem_sharing_audit(void) > #endif > > > -static struct page_info* mem_sharing_alloc_page(struct domain *d, > - unsigned long gfn) > +static void mem_sharing_notify_helper(struct domain *d, unsigned long > gfn) > { > - struct page_info* page; > struct vcpu *v = current; > mem_event_request_t req; > > - page = alloc_domheap_page(d, 0); > - if(page != NULL) return page; > - > - memset(&req, 0, sizeof(req)); > - req.type = MEM_EVENT_TYPE_SHARED; > - > if ( v->domain != d ) > { > /* XXX This path needs some attention. For now, just fail > foreign > @@ -275,20 +267,20 @@ static struct page_info* mem_sharing_all > gdprintk(XENLOG_ERR, > "Failed alloc on unshare path for foreign (%d) > lookup\n", > d->domain_id); > - return page; > + return; > } > > - vcpu_pause_nosync(v); > - req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; > + if (mem_event_check_ring(d, &d->mem_event->mem_share) < 0) > + return; > > - if(mem_event_check_ring(d, &d->mem_event->mem_share)) return page; > - > + memset(&req, 0, sizeof(req)); > + req.type = MEM_EVENT_TYPE_SHARED; > + req.flags = MEM_EVENT_FLAG_VCPU_PAUSED; > req.gfn = gfn; > req.p2mt = p2m_ram_shared; > req.vcpu_id = v->vcpu_id; > mem_event_put_request(d, &d->mem_event->mem_share, &req); > - > - return page; > + vcpu_pause_nosync(v); > } > > unsigned int mem_sharing_get_nr_saved_mfns(void) > @@ -653,7 +645,7 @@ gfn_found: > if(ret == 0) goto private_page_found; > > old_page = page; > - page = mem_sharing_alloc_page(d, gfn); > + page = alloc_domheap_page(d, 0); > if(!page) > { > /* We''ve failed to obtain memory for private page. Need to re-add > the > @@ -661,6 +653,7 @@ gfn_found: > list_add(&gfn_info->list, &hash_entry->gfns); > put_gfn(d, gfn); > shr_unlock(); > + mem_sharing_notify_helper(d, gfn); > return -ENOMEM; > } > > diff -r d347a8a36d2e -r de6860cb9205 xen/arch/x86/mm/p2m.c > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -884,17 +884,13 @@ void p2m_mem_paging_drop_page(struct dom > struct vcpu *v = current; > mem_event_request_t req; > > - /* Check that there''s space on the ring for this request */ > - if ( mem_event_check_ring(d, &d->mem_event->mem_paging) == 0) > - { > - /* Send release notification to pager */ > - memset(&req, 0, sizeof(req)); > - req.flags |= MEM_EVENT_FLAG_DROP_PAGE; > - req.gfn = gfn; > - req.vcpu_id = v->vcpu_id; > + /* Send release notification to pager */ > + memset(&req, 0, sizeof(req)); > + req.flags = MEM_EVENT_FLAG_DROP_PAGE; > + req.gfn = gfn; > + req.vcpu_id = v->vcpu_id; > > - mem_event_put_request(d, &d->mem_event->mem_paging, &req); > - } > + mem_event_put_request(d, &d->mem_event->mem_paging, &req); > } > > /** > @@ -952,7 +948,6 @@ void p2m_mem_paging_populate(struct doma > /* Pause domain if request came from guest and gfn has paging type */ > if ( p2m_is_paging(p2mt) && v->domain == d ) > { > - vcpu_pause_nosync(v); > req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; > } > /* No need to inform pager if the gfn is not in the page-out path */ > @@ -969,6 +964,10 @@ void p2m_mem_paging_populate(struct doma > req.vcpu_id = v->vcpu_id; > > mem_event_put_request(d, &d->mem_event->mem_paging, &req); > + > + /* Pause guest after put_request to allow wake_up after wait_event */ > + if ( req.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) > + vcpu_pause_nosync(v); > } > > /** > @@ -1071,8 +1070,8 @@ void p2m_mem_paging_resume(struct domain > if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) > vcpu_unpause(d->vcpu[rsp.vcpu_id]); > > - /* Unpause any domains that were paused because the ring was full */ > - mem_event_unpause_vcpus(d); > + /* Unpause all vcpus that were paused because the ring was full */ > + wake_up(&d->mem_event->mem_paging.wq); > } > > void p2m_mem_access_check(unsigned long gpa, bool_t gla_valid, unsigned > long gla, > @@ -1111,7 +1110,7 @@ void p2m_mem_access_check(unsigned long > "Memory access permissions failure, no mem_event > listener: pausing VCPU %d, dom %d\n", > v->vcpu_id, d->domain_id); > > - mem_event_mark_and_pause(v); > + mem_event_mark_and_pause(v, &d->mem_event->mem_access); > } > else > { > @@ -1131,8 +1130,7 @@ void p2m_mem_access_check(unsigned long > req.reason = MEM_EVENT_REASON_VIOLATION; > > /* Pause the current VCPU unconditionally */ > - vcpu_pause_nosync(v); > - req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; > + req.flags = MEM_EVENT_FLAG_VCPU_PAUSED; > > /* Send request to mem event */ > req.gfn = gfn; > @@ -1148,6 +1146,7 @@ void p2m_mem_access_check(unsigned long > mem_event_put_request(d, &d->mem_event->mem_access, &req); > > /* VCPU paused, mem event request sent */ > + vcpu_pause_nosync(v); > } > > void p2m_mem_access_resume(struct p2m_domain *p2m) > @@ -1161,9 +1160,11 @@ void p2m_mem_access_resume(struct p2m_do > if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) > vcpu_unpause(d->vcpu[rsp.vcpu_id]); > > - /* Unpause any domains that were paused because the ring was full or > no listener > - * was available */ > - mem_event_unpause_vcpus(d); > + /* Wakeup all vcpus waiting because the ring was full */ > + wake_up(&d->mem_event->mem_access.wq); > + > + /* Unpause all vcpus that were paused because no listener was > available */ > + mem_event_unpause_vcpus(d, &d->mem_event->mem_access); > } > > > diff -r d347a8a36d2e -r de6860cb9205 xen/include/asm-x86/mem_event.h > --- a/xen/include/asm-x86/mem_event.h > +++ b/xen/include/asm-x86/mem_event.h > @@ -25,12 +25,12 @@ > #define __MEM_EVENT_H__ > > /* Pauses VCPU while marking pause flag for mem event */ > -void mem_event_mark_and_pause(struct vcpu *v); > +void mem_event_mark_and_pause(struct vcpu *v, struct mem_event_domain > *med); > int mem_event_check_ring(struct domain *d, struct mem_event_domain *med); > void mem_event_put_req_producers(struct mem_event_domain *med); > void mem_event_put_request(struct domain *d, struct mem_event_domain > *med, mem_event_request_t *req); > void mem_event_get_response(struct mem_event_domain *med, > mem_event_response_t *rsp); > -void mem_event_unpause_vcpus(struct domain *d); > +void mem_event_unpause_vcpus(struct domain *d, struct mem_event_domain > *med); > > int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec, > XEN_GUEST_HANDLE(void) u_domctl); > diff -r d347a8a36d2e -r de6860cb9205 xen/include/xen/sched.h > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -14,6 +14,7 @@ > #include <xen/nodemask.h> > #include <xen/radix-tree.h> > #include <xen/multicall.h> > +#include <xen/wait.h> > #include <public/xen.h> > #include <public/domctl.h> > #include <public/sysctl.h> > @@ -192,6 +193,10 @@ struct mem_event_domain > mem_event_front_ring_t front_ring; > /* event channel port (vcpu0 only) */ > int xen_port; > + /* mem_event bit for vcpu->pause_flags */ > + int mem_event_bit; > + /* list of vcpus waiting for room in the ring */ > + struct waitqueue_head wq; > }; > > struct mem_event_per_domain > @@ -615,9 +620,12 @@ static inline struct domain *next_domain > /* VCPU affinity has changed: migrating to a new CPU. */ > #define _VPF_migrating 3 > #define VPF_migrating (1UL<<_VPF_migrating) > - /* VCPU is blocked on memory-event ring. */ > -#define _VPF_mem_event 4 > -#define VPF_mem_event (1UL<<_VPF_mem_event) > + /* VCPU is blocked on mem_paging ring. */ > +#define _VPF_me_mem_paging 4 > +#define VPF_me_mem_paging (1UL<<_VPF_me_mem_paging) > + /* VCPU is blocked on mem_access ring. */ > +#define _VPF_me_mem_access 5 > +#define VPF_me_mem_access (1UL<<_VPF_me_mem_access) > > static inline int vcpu_runnable(struct vcpu *v) > { > > > > ------------------------------ > > Message: 3 > Date: Tue, 22 Nov 2011 22:15:19 +0100 > From: Olaf Hering <olaf@aepfle.de> > To: Keir Fraser <keir.xen@gmail.com> > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Need help with fixing the Xen waitqueue > feature > Message-ID: <20111122211519.GA1039@aepfle.de> > Content-Type: text/plain; charset=utf-8 > > On Tue, Nov 22, Olaf Hering wrote: > >> On Tue, Nov 22, Keir Fraser wrote: >> >> > Could it have ended up on the waitqueue? >> >> Unlikely, but I will add checks for that as well. > > I posted three changes which make use of the wait queues. > For some reason the code at the very end of p2m_mem_paging_populate() > triggers when d is dom0, so its vcpu is put to sleep. > > > Olaf > > > > ------------------------------ > > Message: 4 > Date: Tue, 22 Nov 2011 21:53:35 +0000 > From: Keir Fraser <keir.xen@gmail.com> > To: Olaf Hering <olaf@aepfle.de> > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Need help with fixing the Xen waitqueue > feature > Message-ID: <CAF1CA5F.2560C%keir.xen@gmail.com> > Content-Type: text/plain; charset="US-ASCII" > > On 22/11/2011 21:15, "Olaf Hering" <olaf@aepfle.de> wrote: > >> On Tue, Nov 22, Olaf Hering wrote: >> >>> On Tue, Nov 22, Keir Fraser wrote: >>> >>>> Could it have ended up on the waitqueue? >>> >>> Unlikely, but I will add checks for that as well. >> >> I posted three changes which make use of the wait queues. >> For some reason the code at the very end of p2m_mem_paging_populate() >> triggers when d is dom0, so its vcpu is put to sleep. > > We obviously can''t have dom0 going to sleep on paging work. This, at > least, > isn''t a wait-queue bug. > >> Olaf > > > > > > ------------------------------ > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > > End of Xen-devel Digest, Vol 81, Issue 296 > ****************************************** >
On Tue, Nov 22, Andres Lagar-Cavilla wrote:> Olaf, two questions here > > - do you have any insight for events caused by foreign mappings? Those > will be lost with a full ring, with or without wait queuesThe callers of mem_event_check_ring() have to retry if the ring is full. Thats what happens with p2m_mem_paging_populate(), the callers return -ENOENT and expect a retry at some later point.> - we have posted a patch (twice) previously, with changes to ring > management, most importantly sending guest vcpus to sleep when space in > the ring is < d->max_vcpus. I see these two patches as complementary. What > is your take?I''m not proposing to include my patch as is, because it has one issue: wake_up will start all waiting vcpus even if there is just a single slot free in the ringbuffer. You patch is better in this respect because only a few will be started again. I will send comments for it later. Olaf
On 23/11/2011 16:49, "Olaf Hering" <olaf@aepfle.de> wrote:> On Tue, Nov 22, Andres Lagar-Cavilla wrote: > >> Olaf, two questions here >> >> - do you have any insight for events caused by foreign mappings? Those >> will be lost with a full ring, with or without wait queues > > The callers of mem_event_check_ring() have to retry if the ring is full. > Thats what happens with p2m_mem_paging_populate(), the callers return > -ENOENT and expect a retry at some later point. > >> - we have posted a patch (twice) previously, with changes to ring >> management, most importantly sending guest vcpus to sleep when space in >> the ring is < d->max_vcpus. I see these two patches as complementary. What >> is your take? > > I''m not proposing to include my patch as is, because it has one issue: > wake_up will start all waiting vcpus even if there is just a single slot > free in the ringbuffer. You patch is better in this respect because only > a few will be started again.Do you need a wake_up_one() function? -- Keir> I will send comments for it later. > > Olaf > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
On Wed, Nov 23, Keir Fraser wrote:> On 23/11/2011 16:49, "Olaf Hering" <olaf@aepfle.de> wrote: > > I''m not proposing to include my patch as is, because it has one issue: > > wake_up will start all waiting vcpus even if there is just a single slot > > free in the ringbuffer. You patch is better in this respect because only > > a few will be started again. > > Do you need a wake_up_one() function?I''m not sure. In case of paging, if gfn 1234 is missing, several vcpus may wait for it to come back. Other vcpus may wait for other gfns. So if we had a struct waitqueue_head for gfn 1234, and other individual gfns, that would help for this case. Since there cant be more gfns to wait for than the available guest vcpus, a list of waitqueue_head''s with a gfn attached to it would work. If that sounds to crazy, or if I''m on the wrong track, how would you do that? (Perhaps I should prepare a patch to demonstrate what I mean.) Olaf