Hi, As promised, a revised version of the waitq-on-missing-pages patch. I''ve cut it back a fair bit; I think the resulting patch is maybe a bit less efficient, but easier to understand. I''ve smoke-tested it a bit and haven''t seen anything catch fire but I suspect the VMs aren''t very happy. I''ll debug more throughly when I''m back in the office next week, but would really like some feedback from the rest of you in the meantime. Cheers, Tim. # HG changeset patch # User Tim Deegan <tim@xen.org> # Parent 01c1bcbc62224833304ede44187400f65e8a6b4c [RFC] x86/mm: use wait queues for mem_paging Use a wait queue to put a guest vcpu to sleep while the requested gfn is in paging state. This adds missing p2m_mem_paging_populate() calls to some callers of the new get_gfn* variants, which would crash now because they get an invalid mfn. It also fixes guest crashes due to unexpected returns from do_memory_op because copy_to/from_guest ran into a paged gfn. Now those places will always get a valid mfn. This is based on an earlier RFC patch by Olaf Hering, but heavily simplified (removing a per-gfn queue of waiting vcpus in favour of a scan of all vcpus on page-in). Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Tim Deegan <tim@xen.org> diff -r 01c1bcbc6222 xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c Thu Feb 16 08:48:23 2012 +0100 +++ b/xen/arch/x86/mm/p2m.c Thu Feb 16 14:39:13 2012 +0000 @@ -160,13 +160,49 @@ mfn_t __get_gfn_type_access(struct p2m_d } /* For now only perform locking on hap domains */ - if ( locked && (hap_enabled(p2m->domain)) ) + locked = locked && hap_enabled(p2m->domain); + +#ifdef __x86_64__ +again: +#endif + if ( locked ) /* Grab the lock here, don''t release until put_gfn */ gfn_lock(p2m, gfn, 0); mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); #ifdef __x86_64__ + if ( p2m_is_paging(*t) && q != p2m_query + && p2m->domain == current->domain ) + { + if ( locked ) + gfn_unlock(p2m, gfn, 0); + + /* Ping the pager */ + if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged ) + p2m_mem_paging_populate(p2m->domain, gfn); + + /* Safety catch: it _should_ be safe to wait here but if it''s not, + * crash the VM, not the host */ + if ( in_atomic() ) + { + WARN(); + domain_crash(p2m->domain); + } + else + { + current->arch.mem_paging_gfn = gfn; + wait_event(current->arch.mem_paging_wq, ({ + mfn = p2m->get_entry(p2m, gfn, t, a, + p2m_query, page_order); + (*t != p2m_ram_paging_in); + })); + goto again; + } + } +#endif + +#ifdef __x86_64__ if ( q == p2m_unshare && p2m_is_shared(*t) ) { ASSERT(!p2m_is_nestedp2m(p2m)); @@ -942,25 +978,25 @@ void p2m_mem_paging_drop_page(struct dom * This function needs to be called whenever gfn_to_mfn() returns any of the p2m * paging types because the gfn may not be backed by a mfn. * - * The gfn can be in any of the paging states, but the pager needs only be - * notified when the gfn is in the paging-out path (paging_out or paged). This - * function may be called more than once from several vcpus. If the vcpu belongs - * to the guest, the vcpu must be stopped and the pager notified that the vcpu - * was stopped. The pager needs to handle several requests for the same gfn. + * The gfn can be in any of the paging states, but the pager needs only + * be notified when the gfn is in the paging-out path (paging_out or + * paged). This function may be called more than once from several + * vcpus. The pager needs to handle several requests for the same gfn. * - * If the gfn is not in the paging-out path and the vcpu does not belong to the - * guest, nothing needs to be done and the function assumes that a request was - * already sent to the pager. In this case the caller has to try again until the - * gfn is fully paged in again. + * If the gfn is not in the paging-out path nothing needs to be done and + * the function assumes that a request was already sent to the pager. + * In this case the caller has to try again until the gfn is fully paged + * in again. */ void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) { struct vcpu *v = current; - mem_event_request_t req; + mem_event_request_t req = {0}; p2m_type_t p2mt; p2m_access_t a; mfn_t mfn; struct p2m_domain *p2m = p2m_get_hostp2m(d); + int send_request = 0; /* We''re paging. There should be a ring */ int rc = mem_event_claim_slot(d, &d->mem_event->paging); @@ -974,9 +1010,6 @@ void p2m_mem_paging_populate(struct doma else if ( rc < 0 ) return; - memset(&req, 0, sizeof(req)); - req.type = MEM_EVENT_TYPE_PAGING; - /* Fix p2m mapping */ gfn_lock(p2m, gfn, 0); mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL); @@ -986,26 +1019,29 @@ void p2m_mem_paging_populate(struct doma /* Evict will fail now, tag this request for pager */ if ( p2mt == p2m_ram_paging_out ) req.flags |= MEM_EVENT_FLAG_EVICT_FAIL; - - set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_paging_in, a); + if ( p2mt == p2m_ram_paging_out && mfn_valid(mfn) && v->domain == d ) + /* Short-cut back to paged-in state (but not for foreign + * mappings, or the pager couldn''t map it to page it out) */ + set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, + paging_mode_log_dirty(d) + ? p2m_ram_logdirty : p2m_ram_rw, a); + else + { + set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_paging_in, a); + send_request = 1; + } } gfn_unlock(p2m, gfn, 0); - /* Pause domain if request came from guest and gfn has paging type */ - if ( p2m_is_paging(p2mt) && v->domain == d ) + /* No need to inform pager if the gfn is not in the page-out path */ + if ( !send_request ) { - vcpu_pause_nosync(v); - req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; - } - /* No need to inform pager if the gfn is not in the page-out path */ - else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged ) - { - /* gfn is already on its way back and vcpu is not paused */ mem_event_cancel_slot(d, &d->mem_event->paging); return; } /* Send request to pager */ + req.type = MEM_EVENT_TYPE_PAGING; req.gfn = gfn; req.p2mt = p2mt; req.vcpu_id = v->vcpu_id; @@ -1122,6 +1158,7 @@ void p2m_mem_paging_resume(struct domain { struct p2m_domain *p2m = p2m_get_hostp2m(d); mem_event_response_t rsp; + struct vcpu *v; p2m_type_t p2mt; p2m_access_t a; mfn_t mfn; @@ -1147,9 +1184,10 @@ void p2m_mem_paging_resume(struct domain } gfn_unlock(p2m, rsp.gfn, 0); } - /* Unpause domain */ - if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) - vcpu_unpause(d->vcpu[rsp.vcpu_id]); + /* Wake any vcpus that were waiting for this GFN */ + for_each_vcpu ( d, v ) + if ( v->arch.mem_paging_gfn == rsp.gfn ) + wake_up_all(&v->arch.mem_paging_wq); } } diff -r 01c1bcbc6222 xen/include/asm-x86/domain.h --- a/xen/include/asm-x86/domain.h Thu Feb 16 08:48:23 2012 +0100 +++ b/xen/include/asm-x86/domain.h Thu Feb 16 14:39:13 2012 +0000 @@ -4,6 +4,7 @@ #include <xen/config.h> #include <xen/mm.h> #include <xen/radix-tree.h> +#include <xen/wait.h> #include <asm/hvm/vcpu.h> #include <asm/hvm/domain.h> #include <asm/e820.h> @@ -491,6 +492,12 @@ struct arch_vcpu struct paging_vcpu paging; +#ifdef CONFIG_X86_64 + /* Mem-paging: this vcpu is waiting for a gfn to be paged in */ + struct waitqueue_head mem_paging_wq; + unsigned long mem_paging_gfn; +#endif + #ifdef CONFIG_X86_32 /* map_domain_page() mapping cache. */ struct mapcache_vcpu mapcache;
Andres Lagar-Cavilla
2012-Feb-17 16:05 UTC
Re: [RFC] [PATCH] x86/mm: use wait queues for mem_paging
> > On Feb 16, 2012, at 10:20 AM, Tim Deegan wrote: > >> Hi, >> >> As promised, a revised version of the waitq-on-missing-pages patch. >> I''ve cut it back a fair bit; I think the resulting patch is maybe a bit >> less efficient, but easier to understand. >> >> I''ve smoke-tested it a bit and haven''t seen anything catch fire but I >> suspect the VMs aren''t very happy. I''ll debug more throughly when I''m >> back in the office next week, but would really like some feedback from >> the rest of you in the meantime.Well, thanks for taking a stab at it. Looks fairly well. My main observation is that we do not want to unconditionally go on a wait queue everywhere. For example, the grant table code pretty reliable unwinds itself and correctly tells the grant mapper to retry, in the presence of a paged page. The same could be said of emulation (X86_EMUL_RETRY). And certainly the balloon code should not sleep waiting for populate! Hence I had proposed introducing get_gfn_sleep, and I''d be happy if the wait queue code is invoked only there. And then we can call get_gfn_sleep in vmx_load_pdptrs, in guest page table walks, and in __hvm_copy. I understand that is cleaner not to have to make that distinction, but I don''t want to fix things that aren''t broken :) More comments inline>> >> Cheers, >> >> Tim. >> >> # HG changeset patch >> # User Tim Deegan <tim@xen.org> >> # Parent 01c1bcbc62224833304ede44187400f65e8a6b4c >> [RFC] x86/mm: use wait queues for mem_paging >> >> Use a wait queue to put a guest vcpu to sleep while the requested gfn is >> in paging state. This adds missing p2m_mem_paging_populate() calls to >> some callers of the new get_gfn* variants, which would crash now >> because they get an invalid mfn. It also fixes guest crashes due to >> unexpected returns from do_memory_op because copy_to/from_guest ran into >> a paged gfn. Now those places will always get a valid mfn. >> >> This is based on an earlier RFC patch by Olaf Hering, but heavily >> simplified (removing a per-gfn queue of waiting vcpus in favour of >> a scan of all vcpus on page-in). >> >> Signed-off-by: Olaf Hering <olaf@aepfle.de> >> Signed-off-by: Tim Deegan <tim@xen.org> >> >> diff -r 01c1bcbc6222 xen/arch/x86/mm/p2m.c >> --- a/xen/arch/x86/mm/p2m.c Thu Feb 16 08:48:23 2012 +0100 >> +++ b/xen/arch/x86/mm/p2m.c Thu Feb 16 14:39:13 2012 +0000 >> @@ -160,13 +160,49 @@ mfn_t __get_gfn_type_access(struct p2m_d >> } >> >> /* For now only perform locking on hap domains */ >> - if ( locked && (hap_enabled(p2m->domain)) ) >> + locked = locked && hap_enabled(p2m->domain); >> + >> +#ifdef __x86_64__When are we getting rid of 32 bit hypervisor? (half kidding. Only half)>> +again: >> +#endif >> + if ( locked ) >> /* Grab the lock here, don''t release until put_gfn */ >> gfn_lock(p2m, gfn, 0); >> >> mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); >> >> #ifdef __x86_64__ >> + if ( p2m_is_paging(*t) && q != p2m_query >> + && p2m->domain == current->domain ) >> + { >> + if ( locked ) >> + gfn_unlock(p2m, gfn, 0); >> + >> + /* Ping the pager */ >> + if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged ) >> + p2m_mem_paging_populate(p2m->domain, gfn);You want to make sure the mfn is not valid. For p2m_ram_paging_out we still have the mfn in place.>> + >> + /* Safety catch: it _should_ be safe to wait here but if it''s >> not, >> + * crash the VM, not the host */ >> + if ( in_atomic() ) >> + { >> + WARN(); >> + domain_crash(p2m->domain); >> + } >> + else >> + { >> + current->arch.mem_paging_gfn = gfn; >> + wait_event(current->arch.mem_paging_wq, ({ >> + mfn = p2m->get_entry(p2m, gfn, t, a, >> + p2m_query, page_order); >> + (*t != p2m_ram_paging_in);Well, first of all mfn->get_entry will not happen in a locked context, so you will exit the wait loop not holding the p2m lock and crash later. So you want __get_gfn_type_access with q = p2m_query, and then put_gfn if the condition fails. Probably not gonna fit in a ({}) block ;) I think checking for (!p2m_is_paging(*t)) is more appropriate. Or even (!p2m_is_paging(*t) && mfn_valid(mfn)). Although I''m mildly hesitant about the latter because conceivably another vcpu may beat us to grabbing the p2m lock after the gfn is populated, and do something to its mfn.>> + })); >> + goto again; >> + } >> + } >> +#endif >> + >> +#ifdef __x86_64__ >> if ( q == p2m_unshare && p2m_is_shared(*t) ) >> { >> ASSERT(!p2m_is_nestedp2m(p2m)); >> @@ -942,25 +978,25 @@ void p2m_mem_paging_drop_page(struct dom >> * This function needs to be called whenever gfn_to_mfn() returns any of >> the p2m >> * paging types because the gfn may not be backed by a mfn. >> * >> - * The gfn can be in any of the paging states, but the pager needs only >> be >> - * notified when the gfn is in the paging-out path (paging_out or >> paged). This >> - * function may be called more than once from several vcpus. If the >> vcpu belongs >> - * to the guest, the vcpu must be stopped and the pager notified that >> the vcpu >> - * was stopped. The pager needs to handle several requests for the same >> gfn. >> + * The gfn can be in any of the paging states, but the pager needs only >> + * be notified when the gfn is in the paging-out path (paging_out or >> + * paged). This function may be called more than once from several >> + * vcpus. The pager needs to handle several requests for the same gfn. >> * >> - * If the gfn is not in the paging-out path and the vcpu does not >> belong to the >> - * guest, nothing needs to be done and the function assumes that a >> request was >> - * already sent to the pager. In this case the caller has to try again >> until the >> - * gfn is fully paged in again. >> + * If the gfn is not in the paging-out path nothing needs to be done >> and >> + * the function assumes that a request was already sent to the pager. >> + * In this case the caller has to try again until the gfn is fully >> paged >> + * in again. >> */ >> void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) >> { >> struct vcpu *v = current; >> - mem_event_request_t req; >> + mem_event_request_t req = {0}; >> p2m_type_t p2mt; >> p2m_access_t a; >> mfn_t mfn; >> struct p2m_domain *p2m = p2m_get_hostp2m(d); >> + int send_request = 0; >> >> /* We''re paging. There should be a ring */ >> int rc = mem_event_claim_slot(d, &d->mem_event->paging);Given all the optimizations around the send_request flag, it might be worth moving the claim_slot call to an if(send_request) block at the bottom, and get rid of cancel_slot altogether. Claim_slot could potentially go (or cause someone else to go) to a wait queue, and it''s best to not be that rude.>> @@ -974,9 +1010,6 @@ void p2m_mem_paging_populate(struct doma >> else if ( rc < 0 ) >> return; >> >> - memset(&req, 0, sizeof(req)); >> - req.type = MEM_EVENT_TYPE_PAGING; >> - >> /* Fix p2m mapping */ >> gfn_lock(p2m, gfn, 0); >> mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL); >> @@ -986,26 +1019,29 @@ void p2m_mem_paging_populate(struct doma >> /* Evict will fail now, tag this request for pager */ >> if ( p2mt == p2m_ram_paging_out ) >> req.flags |= MEM_EVENT_FLAG_EVICT_FAIL; >> - >> - set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_paging_in, >> a); >> + if ( p2mt == p2m_ram_paging_out && mfn_valid(mfn) && v->domain >> == d ) >> + /* Short-cut back to paged-in state (but not for foreign >> + * mappings, or the pager couldn''t map it to page it out) >> */ >> + set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, >> + paging_mode_log_dirty(d) >> + ? p2m_ram_logdirty : p2m_ram_rw, a); >> + else >> + { >> + set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, >> p2m_ram_paging_in, a); >> + send_request = 1; >> + } >> } >> gfn_unlock(p2m, gfn, 0); >> >> - /* Pause domain if request came from guest and gfn has paging type >> */ >> - if ( p2m_is_paging(p2mt) && v->domain == d ) >> + /* No need to inform pager if the gfn is not in the page-out path >> */ >> + if ( !send_request ) >> { >> - vcpu_pause_nosync(v);I see no sin in stacking vcpu_pauses. Plus, this will need to stay if we (hopefully) adopt get_gfn_sleep.>> - req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; >> - } >> - /* No need to inform pager if the gfn is not in the page-out path >> */ >> - else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged ) >> - { >> - /* gfn is already on its way back and vcpu is not paused */ >> mem_event_cancel_slot(d, &d->mem_event->paging); >> return; >> } >> >> /* Send request to pager */ >> + req.type = MEM_EVENT_TYPE_PAGING; >> req.gfn = gfn; >> req.p2mt = p2mt; >> req.vcpu_id = v->vcpu_id; >> @@ -1122,6 +1158,7 @@ void p2m_mem_paging_resume(struct domain >> { >> struct p2m_domain *p2m = p2m_get_hostp2m(d); >> mem_event_response_t rsp; >> + struct vcpu *v; >> p2m_type_t p2mt; >> p2m_access_t a; >> mfn_t mfn; >> @@ -1147,9 +1184,10 @@ void p2m_mem_paging_resume(struct domain >> } >> gfn_unlock(p2m, rsp.gfn, 0); >> } >> - /* Unpause domain */ >> - if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED ) >> - vcpu_unpause(d->vcpu[rsp.vcpu_id]);I see no sin in stacking vcpu pauses, redux... That''s all I have. Thanks! Andres>> + /* Wake any vcpus that were waiting for this GFN */ >> + for_each_vcpu ( d, v ) >> + if ( v->arch.mem_paging_gfn == rsp.gfn ) >> + wake_up_all(&v->arch.mem_paging_wq); >> } >> } >> >> diff -r 01c1bcbc6222 xen/include/asm-x86/domain.h >> --- a/xen/include/asm-x86/domain.h Thu Feb 16 08:48:23 2012 +0100 >> +++ b/xen/include/asm-x86/domain.h Thu Feb 16 14:39:13 2012 +0000 >> @@ -4,6 +4,7 @@ >> #include <xen/config.h> >> #include <xen/mm.h> >> #include <xen/radix-tree.h> >> +#include <xen/wait.h> >> #include <asm/hvm/vcpu.h> >> #include <asm/hvm/domain.h> >> #include <asm/e820.h> >> @@ -491,6 +492,12 @@ struct arch_vcpu >> >> struct paging_vcpu paging; >> >> +#ifdef CONFIG_X86_64 >> + /* Mem-paging: this vcpu is waiting for a gfn to be paged in */ >> + struct waitqueue_head mem_paging_wq; >> + unsigned long mem_paging_gfn; >> +#endif >> + >> #ifdef CONFIG_X86_32 >> /* map_domain_page() mapping cache. */ >> struct mapcache_vcpu mapcache; > >
Hi, At 08:05 -0800 on 17 Feb (1329465959), Andres Lagar-Cavilla wrote:> Well, thanks for taking a stab at it. Looks fairly well. My main > observation is that we do not want to unconditionally go on a wait queue > everywhere. For example, the grant table code pretty reliable unwinds > itself and correctly tells the grant mapper to retry, in the presence of a > paged page.The grant table code is unlikely to hit a paged-out gfn in the caller''s address space.> The same could be said of emulation (X86_EMUL_RETRY).Fair enough. But the emulator uses hvm_copy() in its callbacks anyway we''d need another flag to say ''special handling for p2m fixups plz'' in all the hvm_copy() interfaces. Unpleasant!> And certainly the balloon code should not sleep waiting for populate!The balloon code should not be trying to populate at all!> Hence I had proposed introducing get_gfn_sleep, and I''d be happy if the > wait queue code is invoked only there. And then we can call get_gfn_sleep > in vmx_load_pdptrs, in guest page table walks, and in __hvm_copy.I see. In the longer term I definitely do not want get_gfn()''s callers to have to deal with p2m_is_paged(), and PoD, and sharing, and every other p2m hack we come up with. I want all that stuff hidden behind get_gfn_*(), with at most a flag to say ''fix up magic MM tricks''. I don''t like get_gfn_sleep() very much but I''ll see if I can find a way to do the equivalent with a flag to get_gfn_*(). But I suspect it will end up with a tri-state argument of: a) just give me the current state; b) try (without sleeping) to fix up paging, PoD and sharing but still maybe make me handle it myself by error propagation; and c) just fix it up. ''b'' is a pretty weird interface. Specially given that in corner cases, ''make me handle it'' might involve hitting a wait queue anyway. If we intend in the fullness of time to get rid of it (as I hope we would) then probably we shouldn''t introduce it. (And if that means pushing this whole change out past 4.2, we should consider doing that.)> >> diff -r 01c1bcbc6222 xen/arch/x86/mm/p2m.c > >> --- a/xen/arch/x86/mm/p2m.c Thu Feb 16 08:48:23 2012 +0100 > >> +++ b/xen/arch/x86/mm/p2m.c Thu Feb 16 14:39:13 2012 +0000 > >> @@ -160,13 +160,49 @@ mfn_t __get_gfn_type_access(struct p2m_d > >> } > >> > >> /* For now only perform locking on hap domains */ > >> - if ( locked && (hap_enabled(p2m->domain)) ) > >> + locked = locked && hap_enabled(p2m->domain); > >> + > >> +#ifdef __x86_64__ > When are we getting rid of 32 bit hypervisor? (half kidding. Only half)Can''t be soon enough for me. I''ve been sharpening the axe for that one for years now. :)> >> +again: > >> +#endif > >> + if ( locked ) > >> /* Grab the lock here, don''t release until put_gfn */ > >> gfn_lock(p2m, gfn, 0); > >> > >> mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); > >> > >> #ifdef __x86_64__ > >> + if ( p2m_is_paging(*t) && q != p2m_query > >> + && p2m->domain == current->domain ) > >> + { > >> + if ( locked ) > >> + gfn_unlock(p2m, gfn, 0); > >> + > >> + /* Ping the pager */ > >> + if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged ) > >> + p2m_mem_paging_populate(p2m->domain, gfn); > > You want to make sure the mfn is not valid. For p2m_ram_paging_out we > still have the mfn in place.Doesn''t p2m_mem_paging_populate already DTRT in all these cases? If not, it really should.> >> + > >> + /* Safety catch: it _should_ be safe to wait here but if it''s > >> not, > >> + * crash the VM, not the host */ > >> + if ( in_atomic() ) > >> + { > >> + WARN(); > >> + domain_crash(p2m->domain); > >> + } > >> + else > >> + { > >> + current->arch.mem_paging_gfn = gfn; > >> + wait_event(current->arch.mem_paging_wq, ({ > >> + mfn = p2m->get_entry(p2m, gfn, t, a, > >> + p2m_query, page_order); > >> + (*t != p2m_ram_paging_in); > > Well, first of all mfn->get_entry will not happen in a locked context, so > you will exit the wait loop not holding the p2m lock and crash later.Yes - we always exit the loop without the lock; then we ''goto again'' and do things properly. I realise it''s a bit inefficient, but on the page-in path the p2m lookups shouldn''t be the bottleneck. :) I did write the version that handled the locking in this loop but it got a bit twisty, plus we need the ''goto again'' for other reasons (see below). As a follow-up I ought to make the page-sharing fixup code that''s just below this use ''goto again'' as well; we shouldn''t really leave this function until we get a lookup where all these cases are dealt with.> So you want __get_gfn_type_access with q = p2m_query, and then put_gfn if the > condition fails. Probably not gonna fit in a ({}) block ;) > > I think checking for (!p2m_is_paging(*t)) is more appropriate.No, that was deliberate, and IIRC a change from Olaf''s patch. If the page gets paged in and back out while we''re asleep, the type goes back to p2m_ram_paged, and in that case we _must_ break out and retry or this vcpu might stall forever.> >> void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) > >> { > >> struct vcpu *v = current; > >> - mem_event_request_t req; > >> + mem_event_request_t req = {0}; > >> p2m_type_t p2mt; > >> p2m_access_t a; > >> mfn_t mfn; > >> struct p2m_domain *p2m = p2m_get_hostp2m(d); > >> + int send_request = 0; > >> > >> /* We''re paging. There should be a ring */ > >> int rc = mem_event_claim_slot(d, &d->mem_event->paging); > > Given all the optimizations around the send_request flag, it might be > worth moving the claim_slot call to an if(send_request) block at the > bottom, and get rid of cancel_slot altogether. Claim_slot could > potentially go (or cause someone else to go) to a wait queue, and it''s > best to not be that rude.Sure. It might be tricky to make sure we unwind properly in the failure case, but I''ll have a go at a follow-up patch that looks at that. (This is not really a regression in this patch AFAICS - in the existing code the get_gfn() callers end up calling this function anyway.)> >> - /* Pause domain if request came from guest and gfn has paging type > >> */ > >> - if ( p2m_is_paging(p2mt) && v->domain == d ) > >> + /* No need to inform pager if the gfn is not in the page-out path > >> */ > >> + if ( !send_request ) > >> { > >> - vcpu_pause_nosync(v); > > I see no sin in stacking vcpu_pauses. Plus, this will need to stay if we > (hopefully) adopt get_gfn_sleep.Yes, if we do it that way, this pause definitely needs to stay. But if not then the waitqueue takes care of this case entirely, and this code is just redundant and confusing. Cheers, Tim.
Andres Lagar-Cavilla
2012-Feb-17 20:57 UTC
Re: [RFC] [PATCH] x86/mm: use wait queues for mem_paging
> Hi, > > At 08:05 -0800 on 17 Feb (1329465959), Andres Lagar-Cavilla wrote: >> Well, thanks for taking a stab at it. Looks fairly well. My main >> observation is that we do not want to unconditionally go on a wait queue >> everywhere. For example, the grant table code pretty reliable unwinds >> itself and correctly tells the grant mapper to retry, in the presence of >> a >> paged page. > > The grant table code is unlikely to hit a paged-out gfn in the caller''s > address space.Oh yeah they do. At least for the target gfn''s behind the grants. That''s why we need the retry loops in the PV backends in dom0.> >> The same could be said of emulation (X86_EMUL_RETRY). > > Fair enough. But the emulator uses hvm_copy() in its callbacks anyway > we''d need another flag to say ''special handling for p2m fixups plz'' in > all the hvm_copy() interfaces. Unpleasant! > >> And certainly the balloon code should not sleep waiting for populate! > > The balloon code should not be trying to populate at all!Agreed. get_gfn_query is needed there. Nothing bad happens because we immediately send drop_page. But it''s wrong-ish.> >> Hence I had proposed introducing get_gfn_sleep, and I''d be happy if the >> wait queue code is invoked only there. And then we can call >> get_gfn_sleep >> in vmx_load_pdptrs, in guest page table walks, and in __hvm_copy. > > I see. In the longer term I definitely do not want get_gfn()''s callers > to have to deal with p2m_is_paged(), and PoD, and sharing, and every > other p2m hack we come up with. I want all that stuff hidden behind > get_gfn_*(), with at most a flag to say ''fix up magic MM tricks''. > > I don''t like get_gfn_sleep() very much but I''ll see if I can find a way > to do the equivalent with a flag to get_gfn_*(). But I suspect it will > end up with a tri-state argument of: > a) just give me the current state; > b) try (without sleeping) to fix up paging, PoD and sharing but > still maybe make me handle it myself by error propagation; and > c) just fix it up. > > ''b'' is a pretty weird interface. Specially given that in corner cases, > ''make me handle it'' might involve hitting a wait queue anyway. If we > intend in the fullness of time to get rid of it (as I hope we would) > then probably we shouldn''t introduce it. (And if that means pushing > this whole change out past 4.2, we should consider doing that.) >It''s weird. b) is an evolutionary side-effect. It should disappear. I guess what I''m arguing for is "it should disappear in the long term, not on a patch so close to 4.2">> >> diff -r 01c1bcbc6222 xen/arch/x86/mm/p2m.c >> >> --- a/xen/arch/x86/mm/p2m.c Thu Feb 16 08:48:23 2012 +0100 >> >> +++ b/xen/arch/x86/mm/p2m.c Thu Feb 16 14:39:13 2012 +0000 >> >> @@ -160,13 +160,49 @@ mfn_t __get_gfn_type_access(struct p2m_d >> >> } >> >> >> >> /* For now only perform locking on hap domains */ >> >> - if ( locked && (hap_enabled(p2m->domain)) ) >> >> + locked = locked && hap_enabled(p2m->domain); >> >> + >> >> +#ifdef __x86_64__ >> When are we getting rid of 32 bit hypervisor? (half kidding. Only half) > > Can''t be soon enough for me. I''ve been sharpening the axe for that one > for years now. :)What''s stopping the death blow?> >> >> +again: >> >> +#endif >> >> + if ( locked ) >> >> /* Grab the lock here, don''t release until put_gfn */ >> >> gfn_lock(p2m, gfn, 0); >> >> >> >> mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); >> >> >> >> #ifdef __x86_64__ >> >> + if ( p2m_is_paging(*t) && q != p2m_query >> >> + && p2m->domain == current->domain ) >> >> + { >> >> + if ( locked ) >> >> + gfn_unlock(p2m, gfn, 0); >> >> + >> >> + /* Ping the pager */ >> >> + if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged ) >> >> + p2m_mem_paging_populate(p2m->domain, gfn); >> >> You want to make sure the mfn is not valid. For p2m_ram_paging_out we >> still have the mfn in place. > > Doesn''t p2m_mem_paging_populate already DTRT in all these cases? If > not, it really should. >It''s not about populate, it''s about killing the domain unnecessarily in the in_atomic check.>> >> + >> >> + /* Safety catch: it _should_ be safe to wait here but if >> it''s >> >> not, >> >> + * crash the VM, not the host */ >> >> + if ( in_atomic() ) >> >> + { >> >> + WARN(); >> >> + domain_crash(p2m->domain); >> >> + } >> >> + else >> >> + { >> >> + current->arch.mem_paging_gfn = gfn; >> >> + wait_event(current->arch.mem_paging_wq, ({ >> >> + mfn = p2m->get_entry(p2m, gfn, t, a, >> >> + p2m_query, page_order); >> >> + (*t != p2m_ram_paging_in); >> >> Well, first of all mfn->get_entry will not happen in a locked context, >> so >> you will exit the wait loop not holding the p2m lock and crash later. > > Yes - we always exit the loop without the lock; then we ''goto again'' and > do things properly. I realise it''s a bit inefficient, but on the > page-in path the p2m lookups shouldn''t be the bottleneck. :) I did > write the version that handled the locking in this loop but it got a bit > twisty, plus we need the ''goto again'' for other reasons (see below).Yes, but, you''ll be reading the p2m in a racy manner.> > As a follow-up I ought to make the page-sharing fixup code that''s just > below this use ''goto again'' as well; we shouldn''t really leave this > function until we get a lookup where all these cases are dealt with. >I was intending to do that and then I stepped back and ended up with the current patch submitted a couple of days ago. Same problem, can''t safely go to a wait queue. I think we won''t be able to have a magic get_gfn_solve_everything_silently() call until we find a solution to the fact that get_gfn happens in locked contexts. I''ve though about a wacky idea that 1. schedules a sleep-n-retry softirq 2. fails the call 3. avoids crashing the domain on the unwind path 4. when executing softirqs pauses the vcpu as per 1. 5. When woken up retries the last failed hypercall/vmexit/whatever I shouldn''t have said that out loud...>> So you want __get_gfn_type_access with q = p2m_query, and then put_gfn >> if the >> condition fails. Probably not gonna fit in a ({}) block ;) >> >> I think checking for (!p2m_is_paging(*t)) is more appropriate. > > No, that was deliberate, and IIRC a change from Olaf''s patch. If the > page gets paged in and back out while we''re asleep, the type goes back > to p2m_ram_paged, and in that case we _must_ break out and retry or this > vcpu might stall forever.I see, a new populate event is needed.> >> >> void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) >> >> { >> >> struct vcpu *v = current; >> >> - mem_event_request_t req; >> >> + mem_event_request_t req = {0}; >> >> p2m_type_t p2mt; >> >> p2m_access_t a; >> >> mfn_t mfn; >> >> struct p2m_domain *p2m = p2m_get_hostp2m(d); >> >> + int send_request = 0; >> >> >> >> /* We''re paging. There should be a ring */ >> >> int rc = mem_event_claim_slot(d, &d->mem_event->paging); >> >> Given all the optimizations around the send_request flag, it might be >> worth moving the claim_slot call to an if(send_request) block at the >> bottom, and get rid of cancel_slot altogether. Claim_slot could >> potentially go (or cause someone else to go) to a wait queue, and it''s >> best to not be that rude. > > Sure. It might be tricky to make sure we unwind properly in the failure > case, but I''ll have a go at a follow-up patch that looks at that. > > (This is not really a regression in this patch AFAICS - in the existing > code > the get_gfn() callers end up calling this function anyway.)*After* they''ve put gfn. Not a regression, just cleaner code imo. Andres> >> >> - /* Pause domain if request came from guest and gfn has paging >> type >> >> */ >> >> - if ( p2m_is_paging(p2mt) && v->domain == d ) >> >> + /* No need to inform pager if the gfn is not in the page-out >> path >> >> */ >> >> + if ( !send_request ) >> >> { >> >> - vcpu_pause_nosync(v); >> >> I see no sin in stacking vcpu_pauses. Plus, this will need to stay if we >> (hopefully) adopt get_gfn_sleep. > > Yes, if we do it that way, this pause definitely needs to stay. But if > not then the waitqueue takes care of this case entirely, and this code > is just redundant and confusing. > > Cheers, > > Tim. >
Hi, I''m about to repost this patch, with a few other changes and fixups. Further comments below. At 12:57 -0800 on 17 Feb (1329483430), Andres Lagar-Cavilla wrote:> > At 08:05 -0800 on 17 Feb (1329465959), Andres Lagar-Cavilla wrote: > >> Well, thanks for taking a stab at it. Looks fairly well. My main > >> observation is that we do not want to unconditionally go on a wait queue > >> everywhere. For example, the grant table code pretty reliable unwinds > >> itself and correctly tells the grant mapper to retry, in the presence of > >> a > >> paged page. > > > > The grant table code is unlikely to hit a paged-out gfn in the caller''s > > address space. > > Oh yeah they do. At least for the target gfn''s behind the grants. That''s > why we need the retry loops in the PV backends in dom0.The target gfns aren''t in the caller''s p2m, though.> > The balloon code should not be trying to populate at all! > > Agreed. get_gfn_query is needed there. Nothing bad happens because we > immediately send drop_page. But it''s wrong-ish.Fixed> >> Hence I had proposed introducing get_gfn_sleep, and I''d be happy if the > >> wait queue code is invoked only there. And then we can call > >> get_gfn_sleep > >> in vmx_load_pdptrs, in guest page table walks, and in __hvm_copy. > > > > I see. In the longer term I definitely do not want get_gfn()''s callers > > to have to deal with p2m_is_paged(), and PoD, and sharing, and every > > other p2m hack we come up with. I want all that stuff hidden behind > > get_gfn_*(), with at most a flag to say ''fix up magic MM tricks''. > > > > I don''t like get_gfn_sleep() very much but I''ll see if I can find a way > > to do the equivalent with a flag to get_gfn_*(). But I suspect it will > > end up with a tri-state argument of: > > a) just give me the current state; > > b) try (without sleeping) to fix up paging, PoD and sharing but > > still maybe make me handle it myself by error propagation; and > > c) just fix it up. > > > > ''b'' is a pretty weird interface. Specially given that in corner cases, > > ''make me handle it'' might involve hitting a wait queue anyway. If we > > intend in the fullness of time to get rid of it (as I hope we would) > > then probably we shouldn''t introduce it. (And if that means pushing > > this whole change out past 4.2, we should consider doing that.) > > > It''s weird. b) is an evolutionary side-effect. It should disappear. I > guess what I''m arguing for is "it should disappear in the long term, not > on a patch so close to 4.2"Well, I tried making the ''but don''t go to a waitq'' flag; the implementation is trivial on top of the patches I''m about to post but the question of when to set it was so ugly I gave up. I have hit another stumbling block though - get_two_gfns() can''t be safely ask get_gfn to populate or unshare if that might involve a waitq, since by definition one of its two calls is made with a lock held. I can''t see a nice way of having it retry (not one that''s guaranteed to make progress) without pulling the fixup-and-retry up into that function. I might do that in the next revision.> > Can''t be soon enough for me. I''ve been sharpening the axe for that one > > for years now. :) > > What''s stopping the death blow?People said they still need it.> >> You want to make sure the mfn is not valid. For p2m_ram_paging_out we > >> still have the mfn in place. > > > > Doesn''t p2m_mem_paging_populate already DTRT in all these cases? If > > not, it really should. > > It''s not about populate, it''s about killing the domain unnecessarily in > the in_atomic check.Oh I see, thanks. Fixed.> >> >> + > >> >> + /* Safety catch: it _should_ be safe to wait here but if > >> it''s > >> >> not, > >> >> + * crash the VM, not the host */ > >> >> + if ( in_atomic() ) > >> >> + { > >> >> + WARN(); > >> >> + domain_crash(p2m->domain); > >> >> + } > >> >> + else > >> >> + { > >> >> + current->arch.mem_paging_gfn = gfn; > >> >> + wait_event(current->arch.mem_paging_wq, ({ > >> >> + mfn = p2m->get_entry(p2m, gfn, t, a, > >> >> + p2m_query, page_order); > >> >> + (*t != p2m_ram_paging_in); > >> > >> Well, first of all mfn->get_entry will not happen in a locked context, > >> so > >> you will exit the wait loop not holding the p2m lock and crash later. > > > > Yes - we always exit the loop without the lock; then we ''goto again'' and > > do things properly. I realise it''s a bit inefficient, but on the > > page-in path the p2m lookups shouldn''t be the bottleneck. :) I did > > write the version that handled the locking in this loop but it got a bit > > twisty, plus we need the ''goto again'' for other reasons (see below). > > Yes, but, you''ll be reading the p2m in a racy manner.I don''t think there are any real races here. Either we see p2m_paging_in (and someone will eventually wake us when it pages in) or we don''t (and we go back and do the lookup safely).> > As a follow-up I ought to make the page-sharing fixup code that''s just > > below this use ''goto again'' as well; we shouldn''t really leave this > > function until we get a lookup where all these cases are dealt with.On closer inspection the page-sharing version is OK because it doesn''t release the lock, and the unshare function won''t leave us with a paged-out page.> I think we won''t be able to have a magic > get_gfn_solve_everything_silently() call until we find a solution to the > fact that get_gfn happens in locked contexts.Yes, I agree, and I''m coming round to the opinion that we might be too late to fix that in 4.2. :|> I''ve though about a wacky > idea that > 1. schedules a sleep-n-retry softirq > 2. fails the call > 3. avoids crashing the domain on the unwind path > 4. when executing softirqs pauses the vcpu as per 1. > 5. When woken up retries the last failed hypercall/vmexit/whatever > > I shouldn''t have said that out loud...The first time I approached the idea of paging guest RAM (before Patrick started on the current system) I prototyped something not a million miles from this (basically, a sort of setjmp() on hypervisor entry) but the question of making all hypercalls and VMEXITs idempotent scared me off. :)> >> >> void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) > >> >> { > >> >> struct vcpu *v = current; > >> >> - mem_event_request_t req; > >> >> + mem_event_request_t req = {0}; > >> >> p2m_type_t p2mt; > >> >> p2m_access_t a; > >> >> mfn_t mfn; > >> >> struct p2m_domain *p2m = p2m_get_hostp2m(d); > >> >> + int send_request = 0; > >> >> > >> >> /* We''re paging. There should be a ring */ > >> >> int rc = mem_event_claim_slot(d, &d->mem_event->paging); > >> > >> Given all the optimizations around the send_request flag, it might be > >> worth moving the claim_slot call to an if(send_request) block at the > >> bottom, and get rid of cancel_slot altogether. Claim_slot could > >> potentially go (or cause someone else to go) to a wait queue, and it''s > >> best to not be that rude. > > > > Sure. It might be tricky to make sure we unwind properly in the failure > > case, but I''ll have a go at a follow-up patch that looks at that. > > > > (This is not really a regression in this patch AFAICS - in the existing > > code > > the get_gfn() callers end up calling this function anyway.) > > *After* they''ve put gfn. Not a regression, just cleaner code imo.This is also called after putting the gfn - but in any case I''ve reshuffled the mem_event_claim_slot in the new version. Cheers, Tim.
Andres Lagar-Cavilla
2012-Feb-23 15:55 UTC
Re: [RFC] [PATCH] x86/mm: use wait queues for mem_paging
> Hi, > > I''m about to repost this patch, with a few other changes and fixups. > Further comments below. > > At 12:57 -0800 on 17 Feb (1329483430), Andres Lagar-Cavilla wrote: >> > At 08:05 -0800 on 17 Feb (1329465959), Andres Lagar-Cavilla wrote: >> >> Well, thanks for taking a stab at it. Looks fairly well. My main >> >> observation is that we do not want to unconditionally go on a wait >> queue >> >> everywhere. For example, the grant table code pretty reliable unwinds >> >> itself and correctly tells the grant mapper to retry, in the presence >> of >> >> a >> >> paged page. >> > >> > The grant table code is unlikely to hit a paged-out gfn in the >> caller''s >> > address space. >> >> Oh yeah they do. At least for the target gfn''s behind the grants. That''s >> why we need the retry loops in the PV backends in dom0. > > The target gfns aren''t in the caller''s p2m, though. > >> > The balloon code should not be trying to populate at all! >> >> Agreed. get_gfn_query is needed there. Nothing bad happens because we >> immediately send drop_page. But it''s wrong-ish. > > Fixed > >> >> Hence I had proposed introducing get_gfn_sleep, and I''d be happy if >> the >> >> wait queue code is invoked only there. And then we can call >> >> get_gfn_sleep >> >> in vmx_load_pdptrs, in guest page table walks, and in __hvm_copy. >> > >> > I see. In the longer term I definitely do not want get_gfn()''s >> callers >> > to have to deal with p2m_is_paged(), and PoD, and sharing, and every >> > other p2m hack we come up with. I want all that stuff hidden behind >> > get_gfn_*(), with at most a flag to say ''fix up magic MM tricks''. >> > >> > I don''t like get_gfn_sleep() very much but I''ll see if I can find a >> way >> > to do the equivalent with a flag to get_gfn_*(). But I suspect it >> will >> > end up with a tri-state argument of: >> > a) just give me the current state; >> > b) try (without sleeping) to fix up paging, PoD and sharing but >> > still maybe make me handle it myself by error propagation; and >> > c) just fix it up. >> > >> > ''b'' is a pretty weird interface. Specially given that in corner >> cases, >> > ''make me handle it'' might involve hitting a wait queue anyway. If we >> > intend in the fullness of time to get rid of it (as I hope we would) >> > then probably we shouldn''t introduce it. (And if that means pushing >> > this whole change out past 4.2, we should consider doing that.) >> > >> It''s weird. b) is an evolutionary side-effect. It should disappear. I >> guess what I''m arguing for is "it should disappear in the long term, not >> on a patch so close to 4.2" > > Well, I tried making the ''but don''t go to a waitq'' flag; the > implementation is trivial on top of the patches I''m about to post but > the question of when to set it was so ugly I gave up. > > I have hit another stumbling block though - get_two_gfns() can''t be > safely ask get_gfn to populate or unshare if that might involve a waitq, > since by definition one of its two calls is made with a lock held. I > can''t see a nice way of having it retry (not one that''s guaranteed to > make progress) without pulling the fixup-and-retry up into that > function. I might do that in the next revision.One way to do this is use query_unlocked on the second. If we sense danger, drop the lock on the first, then do the right thing for the second, then drop the lock for the second, then retry at the top. Front-runner for ugliest-thing-in-the-hypervisor award. hvm_task_switch also gets two gfns, although in a manner that prevents easily using get_two_gfns (hvm_map_entry on gdt offsets). It''s unlikely this will ever be a problem (it''s somewhat reasonable to expect both gdt entries to fall in the same gfn) but it''s something worth keeping in mind.> >> > Can''t be soon enough for me. I''ve been sharpening the axe for that >> one >> > for years now. :) >> >> What''s stopping the death blow? > > People said they still need it. > >> >> You want to make sure the mfn is not valid. For p2m_ram_paging_out we >> >> still have the mfn in place. >> > >> > Doesn''t p2m_mem_paging_populate already DTRT in all these cases? If >> > not, it really should. >> >> It''s not about populate, it''s about killing the domain unnecessarily in >> the in_atomic check. > > Oh I see, thanks. Fixed. > >> >> >> + >> >> >> + /* Safety catch: it _should_ be safe to wait here but if >> >> it''s >> >> >> not, >> >> >> + * crash the VM, not the host */ >> >> >> + if ( in_atomic() ) >> >> >> + { >> >> >> + WARN(); >> >> >> + domain_crash(p2m->domain); >> >> >> + } >> >> >> + else >> >> >> + { >> >> >> + current->arch.mem_paging_gfn = gfn; >> >> >> + wait_event(current->arch.mem_paging_wq, ({ >> >> >> + mfn = p2m->get_entry(p2m, gfn, t, a, >> >> >> + p2m_query, >> page_order); >> >> >> + (*t != p2m_ram_paging_in); >> >> >> >> Well, first of all mfn->get_entry will not happen in a locked >> context, >> >> so >> >> you will exit the wait loop not holding the p2m lock and crash later. >> > >> > Yes - we always exit the loop without the lock; then we ''goto again'' >> and >> > do things properly. I realise it''s a bit inefficient, but on the >> > page-in path the p2m lookups shouldn''t be the bottleneck. :) I did >> > write the version that handled the locking in this loop but it got a >> bit >> > twisty, plus we need the ''goto again'' for other reasons (see below). >> >> Yes, but, you''ll be reading the p2m in a racy manner. > > I don''t think there are any real races here. Either we see > p2m_paging_in (and someone will eventually wake us when it pages in) > or we don''t (and we go back and do the lookup safely). > >> > As a follow-up I ought to make the page-sharing fixup code that''s just >> > below this use ''goto again'' as well; we shouldn''t really leave this >> > function until we get a lookup where all these cases are dealt with. > > On closer inspection the page-sharing version is OK because it doesn''t > release the lock, and the unshare function won''t leave us with a > paged-out page. > >> I think we won''t be able to have a magic >> get_gfn_solve_everything_silently() call until we find a solution to the >> fact that get_gfn happens in locked contexts. > > Yes, I agree, and I''m coming round to the opinion that we might be too > late to fix that in 4.2. :| > >> I''ve though about a wacky >> idea that >> 1. schedules a sleep-n-retry softirq >> 2. fails the call >> 3. avoids crashing the domain on the unwind path >> 4. when executing softirqs pauses the vcpu as per 1. >> 5. When woken up retries the last failed hypercall/vmexit/whatever >> >> I shouldn''t have said that out loud... > > The first time I approached the idea of paging guest RAM (before Patrick > started on the current system) I prototyped something not a million > miles from this (basically, a sort of setjmp() on hypervisor entry) but > the question of making all hypercalls and VMEXITs idempotent scared me > off. :) >We''ve used long jump-like stuff. It''s fragile, but it carried us through the day. It''s a non-starter here, I hope. Looking forward to the new patch. Andres>> >> >> void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) >> >> >> { >> >> >> struct vcpu *v = current; >> >> >> - mem_event_request_t req; >> >> >> + mem_event_request_t req = {0}; >> >> >> p2m_type_t p2mt; >> >> >> p2m_access_t a; >> >> >> mfn_t mfn; >> >> >> struct p2m_domain *p2m = p2m_get_hostp2m(d); >> >> >> + int send_request = 0; >> >> >> >> >> >> /* We''re paging. There should be a ring */ >> >> >> int rc = mem_event_claim_slot(d, &d->mem_event->paging); >> >> >> >> Given all the optimizations around the send_request flag, it might be >> >> worth moving the claim_slot call to an if(send_request) block at the >> >> bottom, and get rid of cancel_slot altogether. Claim_slot could >> >> potentially go (or cause someone else to go) to a wait queue, and >> it''s >> >> best to not be that rude. >> > >> > Sure. It might be tricky to make sure we unwind properly in the >> failure >> > case, but I''ll have a go at a follow-up patch that looks at that. >> > >> > (This is not really a regression in this patch AFAICS - in the >> existing >> > code >> > the get_gfn() callers end up calling this function anyway.) >> >> *After* they''ve put gfn. Not a regression, just cleaner code imo. > > This is also called after putting the gfn - but in any case I''ve > reshuffled the mem_event_claim_slot in the new version. > > Cheers, > > Tim. >
At 07:55 -0800 on 23 Feb (1329983736), Andres Lagar-Cavilla wrote:> > I have hit another stumbling block though - get_two_gfns() can''t be > > safely ask get_gfn to populate or unshare if that might involve a waitq, > > since by definition one of its two calls is made with a lock held. I > > can''t see a nice way of having it retry (not one that''s guaranteed to > > make progress) without pulling the fixup-and-retry up into that > > function. I might do that in the next revision. > > One way to do this is use query_unlocked on the second. If we sense > danger, drop the lock on the first, then do the right thing for the > second, then drop the lock for the second, then retry at the top.Sadly, this isn''t guaranteed to make progress in the face of an aggressive pager or sharer. It might be OK with yet another waitq invocation, though...> hvm_task_switch also gets two gfns, although in a manner that prevents > easily using get_two_gfns (hvm_map_entry on gdt offsets). It''s unlikely > this will ever be a problem (it''s somewhat reasonable to expect both gdt > entries to fall in the same gfn) but it''s something worth keeping in mind.Thanks, I''ll add it to the list. Tim.