This is v2 of the patch I posted last week, after feedback from Andres. The first four patches are cleanups that should probably happen even if the main waitq patch doesn''t go in. Patch 5 is the waitq patch, updated but largely following the same approach. I''m not suggesting it be applied right now because I know at least one path (get_two_gfns()) needs work before it''s safe. Patch 6 is a follow-up to avoid claiming a slot unnecessarily in p2m_mem_paging_populate(); it''s also a good idea even if the main waitq patch doesn''t happen but would need to be backported since it only applies on top of those changes. Tim.
Tim Deegan
2012-Feb-23 16:34 UTC
[PATCH 1 of 6] mm: guest_remove_page() should not populate or unshare
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Tim Deegan
2012-Feb-23 16:34 UTC
[PATCH 3 of 6] x86/mm: make ''query type'' argument to get_gfn into a set of flags
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Tim Deegan
2012-Feb-23 16:34 UTC
[PATCH 5 of 6] [RFC] x86/mm: use wait queues for mem_paging
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Tim Deegan
2012-Feb-23 16:34 UTC
[PATCH 6 of 6] x86/mm: Don''t claim a slot on the paging ring if we might not need it
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
At 16:34 +0000 on 23 Feb (1330014862), Tim Deegan wrote:> This is v2 of the patch I posted last week, after feedback from Andres. > > The first four patches are cleanups that should probably happen even > if the main waitq patch doesn''t go in. > > Patch 5 is the waitq patch, updated but largely following the same > approach. I''m not suggesting it be applied right now because I know > at least one path (get_two_gfns()) needs work before it''s safe. > > Patch 6 is a follow-up to avoid claiming a slot unnecessarily in > p2m_mem_paging_populate(); it''s also a good idea even if the main > waitq patch doesn''t happen but would need to be backported since > it only applies on top of those changes.My apologies for these being attached rather than inline -- my hg email runes are clearly rusty. :) Tim
Andres Lagar-Cavilla
2012-Feb-23 16:45 UTC
Re: [PATCH 3 of 6] x86/mm: make ''query type'' argument to get_gfn into a set of flags
On Feb 23, 2012, at 11:34 AM, Tim Deegan wrote:> # HG changeset patch > # User Tim Deegan <tim@xen.org> > # Date 1330013729 0 > # Node ID 00f61d0186a6b098b349b6f94c9a11c5a18dc99a > # Parent ec94098841bfae20101608f3daf7a1a01ff71c49 > x86/mm: make ''query type'' argument to get_gfn into a set of flags > > Having an enum for this won''t work if we want to add any orthogonal > options to it -- the existing code is only correct (after the removal of > p2m_guest in the previous patch) because there are no tests anywhere for > ''== p2m_alloc'', only for ''!= p2m_query'' and ''== p2m_unshare''. > > Replace it with a set of flags. > > Signed-off-by: Tim Deegan <tim@xen.org>P2M_UNSHARE flag should imply P2M_ALLOC. They make no sense separate, and code is simplified a bit. Otherwise: Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>> > diff -r ec94098841bf -r 00f61d0186a6 xen/arch/x86/hvm/emulate.c > --- a/xen/arch/x86/hvm/emulate.c Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/arch/x86/hvm/emulate.c Thu Feb 23 16:15:29 2012 +0000 > @@ -696,7 +696,7 @@ static int hvmemul_rep_movs( > > get_two_gfns(current->domain, sgpa >> PAGE_SHIFT, &sp2mt, NULL, NULL, > current->domain, dgpa >> PAGE_SHIFT, &dp2mt, NULL, NULL, > - p2m_alloc, &tg); > + P2M_ALLOC, &tg); > > if ( !p2m_is_ram(sp2mt) && !p2m_is_grant(sp2mt) ) > { > diff -r ec94098841bf -r 00f61d0186a6 xen/arch/x86/hvm/hvm.c > --- a/xen/arch/x86/hvm/hvm.c Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/arch/x86/hvm/hvm.c Thu Feb 23 16:15:29 2012 +0000 > @@ -1235,7 +1235,7 @@ int hvm_hap_nested_page_fault(unsigned l > > p2m = p2m_get_hostp2m(v->domain); > mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, > - access_w ? p2m_unshare : p2m_alloc, NULL); > + P2M_ALLOC | (access_w ? P2M_UNSHARE : 0), NULL); > > /* Check access permissions first, then handle faults */ > if ( mfn_x(mfn) != INVALID_MFN ) > diff -r ec94098841bf -r 00f61d0186a6 xen/arch/x86/hvm/svm/svm.c > --- a/xen/arch/x86/hvm/svm/svm.c Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/arch/x86/hvm/svm/svm.c Thu Feb 23 16:15:29 2012 +0000 > @@ -1265,7 +1265,7 @@ static void svm_do_nested_pgfault(struct > p2m = p2m_get_p2m(v); > _d.gpa = gpa; > _d.qualification = 0; > - mfn = get_gfn_type_access(p2m, gfn, &_d.p2mt, &p2ma, p2m_query, NULL); > + mfn = get_gfn_type_access(p2m, gfn, &_d.p2mt, &p2ma, 0, NULL); > __put_gfn(p2m, gfn); > _d.mfn = mfn_x(mfn); > > @@ -1287,7 +1287,7 @@ static void svm_do_nested_pgfault(struct > if ( p2m == NULL ) > p2m = p2m_get_p2m(v); > /* Everything else is an error. */ > - mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, p2m_query, NULL); > + mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 0, NULL); > __put_gfn(p2m, gfn); > gdprintk(XENLOG_ERR, > "SVM violation gpa %#"PRIpaddr", mfn %#lx, type %i\n", > diff -r ec94098841bf -r 00f61d0186a6 xen/arch/x86/mm/guest_walk.c > --- a/xen/arch/x86/mm/guest_walk.c Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/arch/x86/mm/guest_walk.c Thu Feb 23 16:15:29 2012 +0000 > @@ -98,7 +98,8 @@ static inline void *map_domain_gfn(struc > void *map; > > /* Translate the gfn, unsharing if shared */ > - *mfn = get_gfn_type_access(p2m, gfn_x(gfn), p2mt, &p2ma, p2m_unshare, NULL); > + *mfn = get_gfn_type_access(p2m, gfn_x(gfn), p2mt, &p2ma, > + P2M_ALLOC | P2M_UNSHARE, NULL); > if ( p2m_is_paging(*p2mt) ) > { > ASSERT(!p2m_is_nestedp2m(p2m)); > diff -r ec94098841bf -r 00f61d0186a6 xen/arch/x86/mm/hap/guest_walk.c > --- a/xen/arch/x86/mm/hap/guest_walk.c Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/arch/x86/mm/hap/guest_walk.c Thu Feb 23 16:15:29 2012 +0000 > @@ -60,7 +60,8 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA > > /* Get the top-level table''s MFN */ > top_gfn = cr3 >> PAGE_SHIFT; > - top_mfn = get_gfn_type_access(p2m, top_gfn, &p2mt, &p2ma, p2m_unshare, NULL); > + top_mfn = get_gfn_type_access(p2m, top_gfn, &p2mt, &p2ma, > + P2M_ALLOC | P2M_UNSHARE, NULL); > if ( p2m_is_paging(p2mt) ) > { > ASSERT(!p2m_is_nestedp2m(p2m)); > @@ -96,7 +97,8 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA > if ( missing == 0 ) > { > gfn_t gfn = guest_l1e_get_gfn(gw.l1e); > - (void)get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma, p2m_unshare, NULL); > + (void)get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma, > + P2M_ALLOC | P2M_UNSHARE, NULL); > if ( p2m_is_paging(p2mt) ) > { > ASSERT(!p2m_is_nestedp2m(p2m)); > diff -r ec94098841bf -r 00f61d0186a6 xen/arch/x86/mm/hap/nested_hap.c > --- a/xen/arch/x86/mm/hap/nested_hap.c Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/arch/x86/mm/hap/nested_hap.c Thu Feb 23 16:15:29 2012 +0000 > @@ -150,7 +150,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain > > /* walk L0 P2M table */ > mfn = get_gfn_type_access(p2m, L1_gpa >> PAGE_SHIFT, &p2mt, &p2ma, > - p2m_query, page_order); > + 0, page_order); > > rc = NESTEDHVM_PAGEFAULT_MMIO; > if ( p2m_is_mmio(p2mt) ) > diff -r ec94098841bf -r 00f61d0186a6 xen/arch/x86/mm/mem_sharing.c > --- a/xen/arch/x86/mm/mem_sharing.c Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/arch/x86/mm/mem_sharing.c Thu Feb 23 16:15:29 2012 +0000 > @@ -734,7 +734,7 @@ int mem_sharing_share_pages(struct domai > > get_two_gfns(sd, sgfn, &smfn_type, NULL, &smfn, > cd, cgfn, &cmfn_type, NULL, &cmfn, > - p2m_query, &tg); > + 0, &tg); > > /* This tricky business is to avoid two callers deadlocking if > * grabbing pages in opposite client/source order */ > @@ -849,7 +849,7 @@ int mem_sharing_add_to_physmap(struct do > > get_two_gfns(sd, sgfn, &smfn_type, NULL, &smfn, > cd, cgfn, &cmfn_type, &a, &cmfn, > - p2m_query, &tg); > + 0, &tg); > > /* Get the source shared page, check and lock */ > ret = XENMEM_SHARING_OP_S_HANDLE_INVALID; > diff -r ec94098841bf -r 00f61d0186a6 xen/arch/x86/mm/p2m-ept.c > --- a/xen/arch/x86/mm/p2m-ept.c Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/arch/x86/mm/p2m-ept.c Thu Feb 23 16:15:29 2012 +0000 > @@ -514,7 +514,7 @@ static mfn_t ept_get_entry(struct p2m_do > goto out; > else if ( ret == GUEST_TABLE_POD_PAGE ) > { > - if ( q == p2m_query ) > + if ( !(q & P2M_ALLOC) ) > { > *t = p2m_populate_on_demand; > goto out; > @@ -541,7 +541,7 @@ static mfn_t ept_get_entry(struct p2m_do > > if ( ept_entry->sa_p2mt == p2m_populate_on_demand ) > { > - if ( q == p2m_query ) > + if ( !(q & P2M_ALLOC) ) > { > *t = p2m_populate_on_demand; > goto out; > diff -r ec94098841bf -r 00f61d0186a6 xen/arch/x86/mm/p2m-pod.c > --- a/xen/arch/x86/mm/p2m-pod.c Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/arch/x86/mm/p2m-pod.c Thu Feb 23 16:15:29 2012 +0000 > @@ -529,7 +529,7 @@ p2m_pod_decrease_reservation(struct doma > p2m_access_t a; > p2m_type_t t; > > - (void)p2m->get_entry(p2m, gpfn + i, &t, &a, p2m_query, NULL); > + (void)p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL); > > if ( t == p2m_populate_on_demand ) > pod++; > @@ -570,7 +570,7 @@ p2m_pod_decrease_reservation(struct doma > p2m_type_t t; > p2m_access_t a; > > - mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL); > if ( t == p2m_populate_on_demand ) > { > set_p2m_entry(p2m, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid, p2m->default_access); > @@ -656,7 +656,7 @@ p2m_pod_zero_check_superpage(struct p2m_ > for ( i=0; i<SUPERPAGE_PAGES; i++ ) > { > p2m_access_t a; > - mfn = p2m->get_entry(p2m, gfn + i, &type, &a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, gfn + i, &type, &a, 0, NULL); > > if ( i == 0 ) > { > @@ -786,7 +786,7 @@ p2m_pod_zero_check(struct p2m_domain *p2 > for ( i=0; i<count; i++ ) > { > p2m_access_t a; > - mfns[i] = p2m->get_entry(p2m, gfns[i], types + i, &a, p2m_query, NULL); > + mfns[i] = p2m->get_entry(p2m, gfns[i], types + i, &a, 0, NULL); > /* If this is ram, and not a pagetable or from the xen heap, and probably not mapped > elsewhere, map it; otherwise, skip. */ > if ( p2m_is_ram(types[i]) > @@ -932,7 +932,7 @@ p2m_pod_emergency_sweep(struct p2m_domai > for ( i=p2m->pod.reclaim_single; i > 0 ; i-- ) > { > p2m_access_t a; > - (void)p2m->get_entry(p2m, i, &t, &a, p2m_query, NULL); > + (void)p2m->get_entry(p2m, i, &t, &a, 0, NULL); > if ( p2m_is_ram(t) ) > { > gfns[j] = i; > @@ -1130,7 +1130,7 @@ guest_physmap_mark_populate_on_demand(st > for ( i = 0; i < (1UL << order); i++ ) > { > p2m_access_t a; > - omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, p2m_query, NULL); > + omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, NULL); > if ( p2m_is_ram(ot) ) > { > printk("%s: gfn_to_mfn returned type %d!\n", > diff -r ec94098841bf -r 00f61d0186a6 xen/arch/x86/mm/p2m-pt.c > --- a/xen/arch/x86/mm/p2m-pt.c Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/arch/x86/mm/p2m-pt.c Thu Feb 23 16:15:29 2012 +0000 > @@ -513,7 +513,7 @@ pod_retry_l3: > (p2m_flags_to_type(l3e_get_flags(l3e)) == p2m_populate_on_demand) ) > { > /* The read has succeeded, so we know that mapping exists */ > - if ( q != p2m_query ) > + if ( q & P2M_ALLOC ) > { > if ( !p2m_pod_demand_populate(p2m, gfn, PAGE_ORDER_1G, q) ) > goto pod_retry_l3; > @@ -565,7 +565,7 @@ pod_retry_l2: > { > /* The read has succeeded, so we know that the mapping > * exits at this point. */ > - if ( q != p2m_query ) > + if ( q & P2M_ALLOC ) > { > if ( !p2m_pod_demand_populate(p2m, gfn, > PAGE_ORDER_2M, q) ) > @@ -623,7 +623,7 @@ pod_retry_l1: > { > /* The read has succeeded, so we know that the mapping > * exits at this point. */ > - if ( q != p2m_query ) > + if ( q & P2M_ALLOC ) > { > if ( !p2m_pod_demand_populate(p2m, gfn, > PAGE_ORDER_4K, q) ) > @@ -714,7 +714,7 @@ pod_retry_l3: > { > if ( p2m_flags_to_type(l3e_get_flags(*l3e)) == p2m_populate_on_demand ) > { > - if ( q != p2m_query ) > + if ( q & P2M_ALLOC ) > { > if ( !p2m_pod_demand_populate(p2m, gfn, PAGE_ORDER_1G, q) ) > goto pod_retry_l3; > @@ -753,7 +753,7 @@ pod_retry_l2: > /* PoD: Try to populate a 2-meg chunk */ > if ( p2m_flags_to_type(l2e_get_flags(*l2e)) == p2m_populate_on_demand ) > { > - if ( q != p2m_query ) { > + if ( q & P2M_ALLOC ) { > if ( !p2m_pod_demand_populate(p2m, gfn, PAGE_ORDER_2M, q) ) > goto pod_retry_l2; > } else > @@ -786,7 +786,7 @@ pod_retry_l1: > /* PoD: Try to populate */ > if ( p2m_flags_to_type(l1e_get_flags(*l1e)) == p2m_populate_on_demand ) > { > - if ( q != p2m_query ) { > + if ( q & P2M_ALLOC ) { > if ( !p2m_pod_demand_populate(p2m, gfn, PAGE_ORDER_4K, q) ) > goto pod_retry_l1; > } else > diff -r ec94098841bf -r 00f61d0186a6 xen/arch/x86/mm/p2m.c > --- a/xen/arch/x86/mm/p2m.c Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/arch/x86/mm/p2m.c Thu Feb 23 16:15:29 2012 +0000 > @@ -167,7 +167,7 @@ mfn_t __get_gfn_type_access(struct p2m_d > mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); > > #ifdef __x86_64__ > - if ( q == p2m_unshare && p2m_is_shared(*t) ) > + if ( (q & P2M_UNSHARE) && p2m_is_shared(*t) ) > { > ASSERT(!p2m_is_nestedp2m(p2m)); > mem_sharing_unshare_page(p2m->domain, gfn, 0); > @@ -180,7 +180,7 @@ mfn_t __get_gfn_type_access(struct p2m_d > { > /* Return invalid_mfn to avoid caller''s access */ > mfn = _mfn(INVALID_MFN); > - if (q != p2m_query) > + if ( q & P2M_ALLOC ) > domain_crash(p2m->domain); > } > #endif > @@ -367,7 +367,7 @@ void p2m_teardown(struct p2m_domain *p2m > for ( gfn=0; gfn < p2m->max_mapped_pfn; gfn++ ) > { > p2m_access_t a; > - mfn = p2m->get_entry(p2m, gfn, &t, &a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL); > if ( mfn_valid(mfn) && (t == p2m_ram_shared) ) > { > ASSERT(!p2m_is_nestedp2m(p2m)); > @@ -437,7 +437,7 @@ p2m_remove_page(struct p2m_domain *p2m, > { > for ( i = 0; i < (1UL << page_order); i++ ) > { > - mfn_return = p2m->get_entry(p2m, gfn + i, &t, &a, p2m_query, NULL); > + mfn_return = p2m->get_entry(p2m, gfn + i, &t, &a, 0, NULL); > if ( !p2m_is_grant(t) && !p2m_is_shared(t) ) > set_gpfn_from_mfn(mfn+i, INVALID_M2P_ENTRY); > ASSERT( !p2m_is_valid(t) || mfn + i == mfn_x(mfn_return) ); > @@ -499,7 +499,7 @@ guest_physmap_add_entry(struct domain *d > /* First, remove m->p mappings for existing p->m mappings */ > for ( i = 0; i < (1UL << page_order); i++ ) > { > - omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, p2m_query, NULL); > + omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, NULL); > #ifdef __x86_64__ > if ( p2m_is_shared(ot) ) > { > @@ -512,7 +512,7 @@ guest_physmap_add_entry(struct domain *d > p2m_unlock(p2m); > return rc; > } > - omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, p2m_query, NULL); > + omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, NULL); > ASSERT(!p2m_is_shared(ot)); > } > #endif /* __x86_64__ */ > @@ -561,7 +561,7 @@ guest_physmap_add_entry(struct domain *d > * address */ > P2M_DEBUG("aliased! mfn=%#lx, old gfn=%#lx, new gfn=%#lx\n", > mfn + i, ogfn, gfn + i); > - omfn = p2m->get_entry(p2m, ogfn, &ot, &a, p2m_query, NULL); > + omfn = p2m->get_entry(p2m, ogfn, &ot, &a, 0, NULL); > if ( p2m_is_ram(ot) && !p2m_is_paged(ot) ) > { > ASSERT(mfn_valid(omfn)); > @@ -620,7 +620,7 @@ p2m_type_t p2m_change_type(struct domain > > gfn_lock(p2m, gfn, 0); > > - mfn = p2m->get_entry(p2m, gfn, &pt, &a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, gfn, &pt, &a, 0, NULL); > if ( pt == ot ) > set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, nt, p2m->default_access); > > @@ -648,7 +648,7 @@ void p2m_change_type_range(struct domain > > for ( gfn = start; gfn < end; gfn++ ) > { > - mfn = p2m->get_entry(p2m, gfn, &pt, &a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, gfn, &pt, &a, 0, NULL); > if ( pt == ot ) > set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, nt, p2m->default_access); > } > @@ -674,7 +674,7 @@ set_mmio_p2m_entry(struct domain *d, uns > return 0; > > gfn_lock(p2m, gfn, 0); > - omfn = p2m->get_entry(p2m, gfn, &ot, &a, p2m_query, NULL); > + omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL); > if ( p2m_is_grant(ot) ) > { > p2m_unlock(p2m); > @@ -710,7 +710,7 @@ clear_mmio_p2m_entry(struct domain *d, u > return 0; > > gfn_lock(p2m, gfn, 0); > - mfn = p2m->get_entry(p2m, gfn, &t, &a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL); > > /* Do not use mfn_valid() here as it will usually fail for MMIO pages. */ > if ( (INVALID_MFN == mfn_x(mfn)) || (t != p2m_mmio_direct) ) > @@ -741,7 +741,7 @@ set_shared_p2m_entry(struct domain *d, u > return 0; > > gfn_lock(p2m, gfn, 0); > - omfn = p2m->get_entry(p2m, gfn, &ot, &a, p2m_query, NULL); > + omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL); > /* At the moment we only allow p2m change if gfn has already been made > * sharable first */ > ASSERT(p2m_is_shared(ot)); > @@ -793,7 +793,7 @@ int p2m_mem_paging_nominate(struct domai > > gfn_lock(p2m, gfn, 0); > > - mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL); > > /* Check if mfn is valid */ > if ( !mfn_valid(mfn) ) > @@ -856,7 +856,7 @@ int p2m_mem_paging_evict(struct domain * > gfn_lock(p2m, gfn, 0); > > /* Get mfn */ > - mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL); > if ( unlikely(!mfn_valid(mfn)) ) > goto out; > > @@ -980,7 +980,7 @@ void p2m_mem_paging_populate(struct doma > > /* Fix p2m mapping */ > gfn_lock(p2m, gfn, 0); > - mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL); > /* Allow only nominated or evicted pages to enter page-in path */ > if ( p2mt == p2m_ram_paging_out || p2mt == p2m_ram_paged ) > { > @@ -1042,7 +1042,7 @@ int p2m_mem_paging_prep(struct domain *d > > gfn_lock(p2m, gfn, 0); > > - mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL); > > ret = -ENOENT; > /* Allow missing pages */ > @@ -1135,7 +1135,7 @@ void p2m_mem_paging_resume(struct domain > if ( !(rsp.flags & MEM_EVENT_FLAG_DROP_PAGE) ) > { > gfn_lock(p2m, rsp.gfn, 0); > - mfn = p2m->get_entry(p2m, rsp.gfn, &p2mt, &a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, rsp.gfn, &p2mt, &a, 0, NULL); > /* Allow only pages which were prepared properly, or pages which > * were nominated but not evicted */ > if ( mfn_valid(mfn) && (p2mt == p2m_ram_paging_in) ) > @@ -1168,7 +1168,7 @@ bool_t p2m_mem_access_check(unsigned lon > > /* First, handle rx2rw conversion automatically */ > gfn_lock(p2m, gfn, 0); > - mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, 0, NULL); > > if ( access_w && p2ma == p2m_access_rx2rw ) > { > @@ -1297,7 +1297,7 @@ int p2m_set_mem_access(struct domain *d, > p2m_lock(p2m); > for ( pfn = start_pfn; pfn < start_pfn + nr; pfn++ ) > { > - mfn = p2m->get_entry(p2m, pfn, &t, &_a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, pfn, &t, &_a, 0, NULL); > if ( p2m->set_entry(p2m, pfn, mfn, PAGE_ORDER_4K, t, a) == 0 ) > { > rc = -ENOMEM; > @@ -1338,7 +1338,7 @@ int p2m_get_mem_access(struct domain *d, > } > > gfn_lock(p2m, gfn, 0); > - mfn = p2m->get_entry(p2m, pfn, &t, &a, p2m_query, NULL); > + mfn = p2m->get_entry(p2m, pfn, &t, &a, 0, NULL); > gfn_unlock(p2m, gfn, 0); > > if ( mfn_x(mfn) == INVALID_MFN ) > @@ -1579,7 +1579,7 @@ void audit_p2m(struct domain *d, > continue; > } > > - p2mfn = get_gfn_type_access(p2m, gfn, &type, &p2ma, p2m_query, NULL); > + p2mfn = get_gfn_type_access(p2m, gfn, &type, &p2ma, 0, NULL); > if ( mfn_x(p2mfn) != mfn ) > { > mpbad++; > diff -r ec94098841bf -r 00f61d0186a6 xen/arch/x86/mm/shadow/types.h > --- a/xen/arch/x86/mm/shadow/types.h Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/arch/x86/mm/shadow/types.h Thu Feb 23 16:15:29 2012 +0000 > @@ -193,7 +193,7 @@ static inline shadow_l4e_t shadow_l4e_fr > > /* Override get_gfn to work with gfn_t */ > #undef get_gfn_query > -#define get_gfn_query(d, g, t) get_gfn_type((d), gfn_x(g), (t), p2m_query) > +#define get_gfn_query(d, g, t) get_gfn_type((d), gfn_x(g), (t), 0) > > /* The shadow types needed for the various levels. */ > > diff -r ec94098841bf -r 00f61d0186a6 xen/include/asm-x86/guest_pt.h > --- a/xen/include/asm-x86/guest_pt.h Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/include/asm-x86/guest_pt.h Thu Feb 23 16:15:29 2012 +0000 > @@ -53,7 +53,7 @@ gfn_to_paddr(gfn_t gfn) > > /* Override get_gfn to work with gfn_t */ > #undef get_gfn > -#define get_gfn(d, g, t) get_gfn_type((d), gfn_x(g), (t), p2m_alloc) > +#define get_gfn(d, g, t) get_gfn_type((d), gfn_x(g), (t), P2M_ALLOC) > > > /* Types of the guest''s page tables and access functions for them */ > diff -r ec94098841bf -r 00f61d0186a6 xen/include/asm-x86/p2m.h > --- a/xen/include/asm-x86/p2m.h Thu Feb 23 16:15:29 2012 +0000 > +++ b/xen/include/asm-x86/p2m.h Thu Feb 23 16:15:29 2012 +0000 > @@ -115,11 +115,9 @@ typedef enum { > } p2m_access_t; > > /* Modifiers to the query */ > -typedef enum { > - p2m_query, /* Do not populate a PoD entries */ > - p2m_alloc, /* Automatically populate PoD entries */ > - p2m_unshare, /* Break c-o-w sharing; implies alloc */ > -} p2m_query_t; > +typedef unsigned int p2m_query_t; > +#define P2M_ALLOC (1u<<0) /* Populate PoD and paged-out entries */ > +#define P2M_UNSHARE (1u<<1) /* Break CoW sharing */ > > /* We use bitmaps and maks to handle groups of types */ > #define p2m_to_mask(_t) (1UL << (_t)) > @@ -330,9 +328,10 @@ static inline mfn_t get_gfn_type(struct > * N.B. get_gfn_query() is the _only_ one guaranteed not to take the > * p2m lock; none of the others can be called with the p2m or paging > * lock held. */ > -#define get_gfn(d, g, t) get_gfn_type((d), (g), (t), p2m_alloc) > -#define get_gfn_query(d, g, t) get_gfn_type((d), (g), (t), p2m_query) > -#define get_gfn_unshare(d, g, t) get_gfn_type((d), (g), (t), p2m_unshare) > +#define get_gfn(d, g, t) get_gfn_type((d), (g), (t), P2M_ALLOC) > +#define get_gfn_query(d, g, t) get_gfn_type((d), (g), (t), 0) > +#define get_gfn_unshare(d, g, t) get_gfn_type((d), (g), (t), \ > + P2M_ALLOC | P2M_UNSHARE) > > /* Compatibility function exporting the old untyped interface */ > static inline unsigned long get_gfn_untyped(struct domain *d, unsigned long gpfn) > @@ -364,8 +363,7 @@ static inline mfn_t get_gfn_query_unlock > p2m_type_t *t) > { > p2m_access_t a; > - return __get_gfn_type_access(p2m_get_hostp2m(d), gfn, t, &a, > - p2m_query, NULL, 0); > + return __get_gfn_type_access(p2m_get_hostp2m(d), gfn, t, &a, 0, NULL, 0); > } > > /* General conversion function from mfn to gfn */
Andres Lagar-Cavilla
2012-Feb-23 16:48 UTC
Re: [PATCH 6 of 6] x86/mm: Don''t claim a slot on the paging ring if we might not need it
On Feb 23, 2012, at 11:34 AM, Tim Deegan wrote:> # HG changeset patch > # User Tim Deegan <tim@xen.org> > # Date 1330013929 0 > # Node ID 516f96172e67ff463c371357e0ab93724c8b9048 > # Parent 510d80343793227bd39b9a5d4d3b61b9a3f776e4 > x86/mm: Don''t claim a slot on the paging ring if we might not need it. > > This avoids the possibility of hitting a wait queue for a slot and then > finding out later that it wasn''t necessary. > > Signed-off-by: Tim Deegan <tim@xen.org> > > diff -r 510d80343793 -r 516f96172e67 xen/arch/x86/mm/p2m.c > --- a/xen/arch/x86/mm/p2m.c Thu Feb 23 16:18:09 2012 +0000 > +++ b/xen/arch/x86/mm/p2m.c Thu Feb 23 16:18:49 2012 +0000 > @@ -1001,19 +1001,23 @@ void p2m_mem_paging_populate(struct doma > p2m_access_t a; > mfn_t mfn; > struct p2m_domain *p2m = p2m_get_hostp2m(d); > - int send_request = 0; > + int rc, send_request = 0; > > - /* We''re paging. There should be a ring */ > - int rc = mem_event_claim_slot(d, &d->mem_event->paging); > - if ( rc == -ENOSYS ) > + if ( !d->mem_event->paging.ring_page )There is a trivial call to do this in non open-coded fashion bool_t mem_event_check_ring(struct mem_event_domain *med); I don''t use it frequently myself…otherwise Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>> { > gdprintk(XENLOG_ERR, "Domain %hu paging gfn %lx yet no ring " > "in place\n", d->domain_id, gfn); > domain_crash(d); > return; > } > - else if ( rc < 0 ) > - return; > + > + /* Foreign users must grab a ring entry before doing any work since > + * they might not be able to get one later. Local users shouldn''t > + * grab a slot until they know they need it, to avoid going on a > + * wait-queue unnecessarily. */ > + if ( d != current->domain ) > + if ( mem_event_claim_slot(d, &d->mem_event->paging) < 0 ) > + return; > > /* Fix p2m mapping */ > gfn_lock(p2m, gfn, 0); > @@ -1042,10 +1046,18 @@ void p2m_mem_paging_populate(struct doma > if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged ) > { > /* gfn is already on its way back and vcpu is not paused */ > - mem_event_cancel_slot(d, &d->mem_event->paging); > + if ( d != current->domain ) > + mem_event_cancel_slot(d, &d->mem_event->paging); > return; > } > > + /* Now local users should claim a slot. */ > + if ( d == current->domain ) > + { > + rc = mem_event_claim_slot(d, &d->mem_event->paging); > + ASSERT(rc == 0); > + } > + > /* Send request to pager */ > req.p2mt = p2mt; > req.vcpu_id = v->vcpu_id;
Andres Lagar-Cavilla
2012-Feb-23 16:49 UTC
Re: [PATCH 0 of 6] [RFC] Use wait queues for paging, v2
On Feb 23, 2012, at 11:34 AM, Tim Deegan wrote:> This is v2 of the patch I posted last week, after feedback from Andres. > > The first four patches are cleanups that should probably happen even > if the main waitq patch doesn''t go in. > > Patch 5 is the waitq patch, updated but largely following the same > approach. I''m not suggesting it be applied right now because I know > at least one path (get_two_gfns()) needs work before it''s safe. > > Patch 6 is a follow-up to avoid claiming a slot unnecessarily in > p2m_mem_paging_populate(); it''s also a good idea even if the main > waitq patch doesn''t happen but would need to be backported since > it only applies on top of those changes.1,2 and 4: Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> 3 and 6 commented separately. Still looking at 5 Andres> > Tim.
Olaf Hering
2012-Feb-24 13:45 UTC
Re: [PATCH 5 of 6] [RFC] x86/mm: use wait queues for mem_paging
On Thu, Feb 23, Tim Deegan wrote:> This is based on an earlier RFC patch by Olaf Hering, but heavily > simplified (removing a per-gfn queue of waiting vcpus in favour of > a scan of all vcpus on page-in).Tim, thanks for that work. From just reading the change it looks ok. A few comments below:> +++ b/xen/arch/x86/mm/p2m.c Thu Feb 23 16:18:09 2012 +0000 > @@ -160,13 +160,49 @@ mfn_t __get_gfn_type_access(struct p2m_d > } > > /* For now only perform locking on hap domains */ > - if ( locked && (hap_enabled(p2m->domain)) ) > + locked = locked && hap_enabled(p2m->domain); > + > +#ifdef __x86_64__ > +again: > +#endif > + if ( locked ) > /* Grab the lock here, don''t release until put_gfn */ > gfn_lock(p2m, gfn, 0); > > mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); > > #ifdef __x86_64__ > + if ( p2m_is_paging(*t) && (q & P2M_ALLOC) > + && p2m->domain == current->domain ) > + { > + if ( locked ) > + gfn_unlock(p2m, gfn, 0); > + > + /* Ping the pager */ > + if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged ) > + p2m_mem_paging_populate(p2m->domain, gfn); > + > + /* Wait until the pager finishes paging it in */ > + current->arch.mem_paging_gfn = gfn; > + wait_event(current->arch.mem_paging_wq, ({ > + int done; > + mfn = p2m->get_entry(p2m, gfn, t, a, 0, page_order); > + done = (*t != p2m_ram_paging_in);I assume p2m_mem_paging_populate() will not return until the state is forwarded to p2m_ram_paging_in. Maybe p2m_is_paging(*t) would make it more obvious what this check is supposed to do.> + /* Safety catch: it _should_ be safe to wait here > + * but if it''s not, crash the VM, not the host */ > + if ( in_atomic() ) > + { > + WARN(); > + domain_crash(p2m->domain); > + done = 1; > + } > + done; > + })); > + goto again; > + } > +#endif > + > +#ifdef __x86_64__ > if ( (q & P2M_UNSHARE) && p2m_is_shared(*t) ) > { > ASSERT(!p2m_is_nestedp2m(p2m)); > @@ -946,17 +982,17 @@ void p2m_mem_paging_drop_page(struct dom > * This function needs to be called whenever gfn_to_mfn() returns any of the p2m > * paging types because the gfn may not be backed by a mfn. > * > - * The gfn can be in any of the paging states, but the pager needs only be > - * notified when the gfn is in the paging-out path (paging_out or paged). This > - * function may be called more than once from several vcpus. If the vcpu belongs > - * to the guest, the vcpu must be stopped and the pager notified that the vcpu > - * was stopped. The pager needs to handle several requests for the same gfn. > + * The gfn can be in any of the paging states, but the pager needs only > + * be notified when the gfn is in the paging-out path (paging_out or > + * paged). This function may be called more than once from several > + * vcpus. The pager needs to handle several requests for the same gfn. > * > - * If the gfn is not in the paging-out path and the vcpu does not belong to the > - * guest, nothing needs to be done and the function assumes that a request was > - * already sent to the pager. In this case the caller has to try again until the > - * gfn is fully paged in again. > + * If the gfn is not in the paging-out path nothing needs to be done and > + * the function assumes that a request was already sent to the pager. > + * In this case the caller has to try again until the gfn is fully paged > + * in again. > */ > + > void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) > { > struct vcpu *v = current; > @@ -965,6 +1001,7 @@ void p2m_mem_paging_populate(struct doma > p2m_access_t a; > mfn_t mfn; > struct p2m_domain *p2m = p2m_get_hostp2m(d); > + int send_request = 0;Is that variable supposed to be used? Perhaps the feature to fast-forward (or rollback) from p2m_ram_paging_out to p2m_ram_rw could be a separate patch. My initial version of this patch did not have a strict requirement for this feature, if I remember correctly.> /* We''re paging. There should be a ring */ > int rc = mem_event_claim_slot(d, &d->mem_event->paging); > @@ -987,19 +1024,22 @@ void p2m_mem_paging_populate(struct doma > /* Evict will fail now, tag this request for pager */ > if ( p2mt == p2m_ram_paging_out ) > req.flags |= MEM_EVENT_FLAG_EVICT_FAIL; > - > - set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_paging_in, a); > + if ( p2mt == p2m_ram_paging_out && mfn_valid(mfn) && v->domain == d ) > + /* Short-cut back to paged-in state (but not for foreign > + * mappings, or the pager couldn''t map it to page it out) */ > + set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, > + paging_mode_log_dirty(d) > + ? p2m_ram_logdirty : p2m_ram_rw, a); > + else > + { > + set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_paging_in, a); > + send_request = 1; > + } > } > gfn_unlock(p2m, gfn, 0); > > - /* Pause domain if request came from guest and gfn has paging type */ > - if ( p2m_is_paging(p2mt) && v->domain == d ) > - { > - vcpu_pause_nosync(v); > - req.flags |= MEM_EVENT_FLAG_VCPU_PAUSED; > - } > /* No need to inform pager if the gfn is not in the page-out path */ > - else if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged ) > + if ( p2mt != p2m_ram_paging_out && p2mt != p2m_ram_paged ) > { > /* gfn is already on its way back and vcpu is not paused */ > mem_event_cancel_slot(d, &d->mem_event->paging);
On Thu, Feb 23, Tim Deegan wrote:> This is v2 of the patch I posted last week, after feedback from Andres.Tried this series, but processes in dom0 started to hang in D state when a paged guest is started. I will see if I can spot the error. Olaf
On Sun, Feb 26, Olaf Hering wrote:> On Thu, Feb 23, Tim Deegan wrote: > > > This is v2 of the patch I posted last week, after feedback from Andres. > > Tried this series, but processes in dom0 started to hang in D state > when a paged guest is started. I will see if I can spot the error.This change for patch #5 is needed, especially the first part. Now it appears to work. Olaf # HG changeset patch # Parent c3738598897f5239a72cabde676f5e86fd4c8241 diff -r c3738598897f xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -999,6 +999,10 @@ int hvm_vcpu_initialise(struct vcpu *v) v->arch.hvm_vcpu.inject_trap = -1; +#ifdef CONFIG_X86_64 + init_waitqueue_head(&v->arch.hvm_vcpu.mem_paging_wq); +#endif + #ifdef CONFIG_COMPAT rc = setup_compat_arg_xlat(v); if ( rc != 0 ) diff -r c3738598897f xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -183,8 +183,8 @@ again: p2m_mem_paging_populate(p2m->domain, gfn); /* Wait until the pager finishes paging it in */ - current->arch.mem_paging_gfn = gfn; - wait_event(current->arch.mem_paging_wq, ({ + current->arch.hvm_vcpu.mem_paging_gfn = gfn; + wait_event(current->arch.hvm_vcpu.mem_paging_wq, ({ int done; mfn = p2m->get_entry(p2m, gfn, t, a, 0, page_order); done = (*t != p2m_ram_paging_in); @@ -1190,8 +1190,8 @@ void p2m_mem_paging_resume(struct domain } /* Wake any vcpus that were waiting for this GFN */ for_each_vcpu ( d, v ) - if ( v->arch.mem_paging_gfn == rsp.gfn ) - wake_up_all(&v->arch.mem_paging_wq); + if ( v->arch.hvm_vcpu.mem_paging_gfn == rsp.gfn ) + wake_up_all(&v->arch.hvm_vcpu.mem_paging_wq); } } diff -r c3738598897f xen/include/asm-x86/domain.h --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -494,12 +494,6 @@ struct arch_vcpu struct paging_vcpu paging; -#ifdef CONFIG_X86_64 - /* Mem-paging: this vcpu is waiting for a gfn to be paged in */ - struct waitqueue_head mem_paging_wq; - unsigned long mem_paging_gfn; -#endif - #ifdef CONFIG_X86_32 /* map_domain_page() mapping cache. */ struct mapcache_vcpu mapcache; diff -r c3738598897f xen/include/asm-x86/hvm/vcpu.h --- a/xen/include/asm-x86/hvm/vcpu.h +++ b/xen/include/asm-x86/hvm/vcpu.h @@ -170,6 +170,13 @@ struct hvm_vcpu { unsigned long inject_cr2; struct viridian_vcpu viridian; + +#ifdef CONFIG_X86_64 + /* Mem-paging: this vcpu is waiting for a gfn to be paged in */ + struct waitqueue_head mem_paging_wq; + unsigned long mem_paging_gfn; +#endif + }; #endif /* __ASM_X86_HVM_VCPU_H__ */
Tim Deegan
2012-Feb-27 19:26 UTC
Re: [PATCH 5 of 6] [RFC] x86/mm: use wait queues for mem_paging
Hi, At 14:45 +0100 on 24 Feb (1330094744), Olaf Hering wrote:> > #ifdef __x86_64__ > > + if ( p2m_is_paging(*t) && (q & P2M_ALLOC) > > + && p2m->domain == current->domain ) > > + { > > + if ( locked ) > > + gfn_unlock(p2m, gfn, 0); > > + > > + /* Ping the pager */ > > + if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged ) > > + p2m_mem_paging_populate(p2m->domain, gfn); > > + > > + /* Wait until the pager finishes paging it in */ > > + current->arch.mem_paging_gfn = gfn; > > + wait_event(current->arch.mem_paging_wq, ({ > > + int done; > > + mfn = p2m->get_entry(p2m, gfn, t, a, 0, page_order); > > + done = (*t != p2m_ram_paging_in); > > I assume p2m_mem_paging_populate() will not return until the state is > forwarded to p2m_ram_paging_in. Maybe p2m_is_paging(*t) would make it > more obvious what this check is supposed to do.But it would be wrong. If the type anything other than p2m_ram_paging_in, then we can''t be sure that the pager is working on unblocking us. Andres made the same suggestion - clearly this code needs a comment. :)> > + /* Safety catch: it _should_ be safe to wait here > > + * but if it''s not, crash the VM, not the host */ > > + if ( in_atomic() ) > > + { > > + WARN(); > > + domain_crash(p2m->domain); > > + done = 1; > > + } > > + done; > > + })); > > + goto again; > > + } > > +#endif> > void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) > > { > > struct vcpu *v = current; > > @@ -965,6 +1001,7 @@ void p2m_mem_paging_populate(struct doma > > p2m_access_t a; > > mfn_t mfn; > > struct p2m_domain *p2m = p2m_get_hostp2m(d); > > + int send_request = 0; > > Is that variable supposed to be used?Erk. Clearly something got mangled in the rebase. I''ll sort that out.> Perhaps the feature to fast-forward (or rollback) from > p2m_ram_paging_out to p2m_ram_rw could be a separate patch. My initial > version of this patch did not have a strict requirement for this > feature, if I remember correctly.Sure, I can split that into a separate patch. Cheers, Tim.
Olaf Hering
2012-Feb-27 20:18 UTC
Re: [PATCH 5 of 6] [RFC] x86/mm: use wait queues for mem_paging
On Mon, Feb 27, Tim Deegan wrote:> At 14:45 +0100 on 24 Feb (1330094744), Olaf Hering wrote: > > > #ifdef __x86_64__ > > > + if ( p2m_is_paging(*t) && (q & P2M_ALLOC) > > > + && p2m->domain == current->domain ) > > > + { > > > + if ( locked ) > > > + gfn_unlock(p2m, gfn, 0); > > > + > > > + /* Ping the pager */ > > > + if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged ) > > > + p2m_mem_paging_populate(p2m->domain, gfn); > > > + > > > + /* Wait until the pager finishes paging it in */ > > > + current->arch.mem_paging_gfn = gfn; > > > + wait_event(current->arch.mem_paging_wq, ({ > > > + int done; > > > + mfn = p2m->get_entry(p2m, gfn, t, a, 0, page_order); > > > + done = (*t != p2m_ram_paging_in); > > > > I assume p2m_mem_paging_populate() will not return until the state is > > forwarded to p2m_ram_paging_in. Maybe p2m_is_paging(*t) would make it > > more obvious what this check is supposed to do. > > But it would be wrong. If the type anything other than > p2m_ram_paging_in, then we can''t be sure that the pager is working on > unblocking us.Not really wrong. I think it depends what will happen to the p2mt once it leaves the paging state and once the condition in wait_event() is executed again. Now I see what its supposed to mean, it enters wait_event() with p2m_ram_paging_in and it is supposed to leave once the type changed to something else. It works because the page-in path has now only one p2mt, not two like a few weeks ago. Olaf
Andres Lagar-Cavilla
2012-Feb-28 21:11 UTC
Re: [PATCH 0 of 6] [RFC] Use wait queues for paging, v2
> > On Feb 27, 2012, at 11:51 AM, Olaf Hering wrote: > >> On Sun, Feb 26, Olaf Hering wrote: >> >>> On Thu, Feb 23, Tim Deegan wrote: >>> >>>> This is v2 of the patch I posted last week, after feedback from >>>> Andres. >>> >>> Tried this series, but processes in dom0 started to hang in D state >>> when a paged guest is started. I will see if I can spot the error. >> >> This change for patch #5 is needed, especially the first part. >> Now it appears to work. >> >> OlafUnfortunately, I get all kinds of borking on Win7 domains with Citrix PV drivers 6.0. So it''s a no-go from my end for now. The pv drivers are calling memops on pages that are paged out, ka-boom. I applied Tim''s 6 patch series plus Olaf''s fix. Xen WARNs below. Andres (XEN) Xen WARN at p2m.c:200 (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 4 (XEN) RIP: e008:[<ffff82c4801e08e2>] __get_gfn_type_access+0x230/0x3ee (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (XEN) rax: 0000000000000001 rbx: 000000000000000b rcx: ffff83083ffaff18 (XEN) rdx: 00000043bfcac680 rsi: 0000000000000000 rdi: 000000183f775000 (XEN) rbp: ffff83083ffaf9a8 rsp: ffff83083ffaf928 r8: 0000000000000001 (XEN) r9: 0000000000000000 r10: 000000000000000c r11: fffff88002fe3b0e (XEN) r12: ffff83183f7a1bb0 r13: ffff83083ffaf9dc r14: 0000000000000000 (XEN) r15: 00000000000033c0 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 000000183f793000 cr2: 000000007724f4df (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff83083ffaf928: (XEN) ffff83083ffaff18 ffff83083ffaff18 0000000000000000 000000018012635a (XEN) ffff83083ffaf9d8 ffff83083ffaff18 ffff82c48030a760 ffff82c48030a760 (XEN) ffff82c48030a760 ffff82c48030a760 01ff82c4801264b3 0000000000000000 (XEN) ffff83183f7a3000 ffff83083ffafa28 0000000000000000 000000000183f7b9 (XEN) ffff83083ffafa08 ffff82c480170b91 ffff830000000001 ffff83183f7a3010 (XEN) 0000000000000000 ffff82c4801b3d94 0000000b00000007 ffff83183f7a3000 (XEN) fffff88002fe3b80 fffff88002fe3b80 ffff8300bf2ea000 ffff82c4801b5344 (XEN) ffff83083ffafa88 ffff82c480170e4b 8b13687800000000 7d4d94af0000009f (XEN) 0000000100007ff0 0000000000000000 00000000000033c0 ffff82c48015a3ff (XEN) 0000000000000202 ffff83183f7a3000 ffff82c4801265c1 0000000000000008 (XEN) 0000000000000007 fffff88002fe3b80 ffff8300bf2ea000 ffff82c4801b5344 (XEN) ffff83083ffafc78 ffff82c480114c72 0000000000000286 ffff83083ffafab8 (XEN) ffff82c4801265c1 ffff83183f6a19a0 ffff83083ffafae8 ffff82c48012daa1 (XEN) ffff83083ffafb18 0000000000000040 ffff83183f6a1990 ffff83183f7a3000 (XEN) ffff83083ffafb18 ffff82c48012dbc5 0000000000000040 ffff83083ffeffd0 (XEN) ffff83183f7a3000 ffff82c4803099c0 ffff83083ffafb68 ffff82c480105e63 (XEN) 0000000000000003 0000000000000000 ffff83183f6a19d0 ffff83083ffb6048 (XEN) ffff83183f7a1bb0 0000000000000001 0000000000000001 0000000000000001 (XEN) ffff83083ffafbf8 ffff82c4801e6f0d 00000000000000ff 0000000300000000 (XEN) ffff83083ffafbc8 ffff83083ffafbc0 0000000000000000 ffff83083ffafcd8 (XEN) Xen call trace: (XEN) [<ffff82c4801e08e2>] __get_gfn_type_access+0x230/0x3ee (XEN) [<ffff82c480170b91>] xenmem_add_to_physmap_once+0x610/0x773 (XEN) [<ffff82c480170e4b>] arch_memory_op+0x157/0xc7a (XEN) [<ffff82c480114c72>] do_memory_op+0x1d60/0x1dce (XEN) [<ffff82c4801b5409>] hvm_memory_op+0x62/0x6a (XEN) [<ffff82c4801b84f1>] hvm_do_hypercall+0x1af/0x2f5 (XEN) [<ffff82c4801d73a5>] vmx_vmexit_handler+0xef3/0x1849 (XEN) (XEN) domain_crash called from p2m.c:200 (XEN) Domain 2 (vcpu#0) crashed on cpu#4: (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 4 (XEN) RIP: 0010:[<fffffa8002252185>] (XEN) RFLAGS: 0000000000000082 CONTEXT: hvm guest (XEN) rax: 000000000000000c rbx: 00000000000033c0 rcx: 0000000000000180 (XEN) rdx: 0000000000000007 rsi: fffff88002fe3b80 rdi: 0000000000000007 (XEN) rbp: 0000000000000001 rsp: fffff88002fe3b30 r8: fffff88002fe3b80 (XEN) r9: 00000000ffffffff r10: 0000000000000000 r11: fffff88002fe3b0e (XEN) r12: 0000000000000004 r13: fffffa800222d200 r14: fffff880010d7730 (XEN) r15: 0000000000000001 cr0: 0000000080050031 cr4: 00000000000006f8 (XEN) cr3: 0000000000187000 cr2: 000000007724f4df (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0000 cs: 0010 (XEN) Xen WARN at p2m.c:200 (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 4 (XEN) RIP: e008:[<ffff82c4801e08e2>] __get_gfn_type_access+0x230/0x3ee (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (XEN) rax: 0000000000000001 rbx: 000000000000000b rcx: ffff83083ffaff18 (XEN) rdx: 00000043bfcac680 rsi: 0000000000000000 rdi: 000000183f775000 (XEN) rbp: ffff83083ffaf9a8 rsp: ffff83083ffaf928 r8: 0000000000000001 (XEN) r9: 0000000000000000 r10: 0000000000000002 r11: 0000000000000002 (XEN) r12: ffff83183f7a1bb0 r13: ffff83083ffaf9dc r14: 0000000000000000 (XEN) r15: 00000000000033c0 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 000000183f793000 cr2: 000000007724f4df (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff83083ffaf928: (XEN) ffff83083ffaff18 ffff83083ffaff18 0000000000000000 000000018012635a (XEN) ffff83083ffaf9d8 ffff83083ffaff18 ffff82c48030a760 ffff82c48030a760 (XEN) ffff82c48030a760 ffff82c48030a760 01ff82c4801264b3 0000000000000000 (XEN) ffff83183f7a3000 ffff83083ffafa28 0000000000000000 000000000183f7b9 (XEN) ffff83083ffafa08 ffff82c480170b91 ffff830000000001 ffff83183f7a3010 (XEN) 0000000000000000 ffff82c4801b3d94 0000000b00000007 ffff83183f7a3000 (XEN) fffff88002fe3b80 fffff88002fe3b80 ffff8300bf2ea000 ffff82c4801b5344 (XEN) ffff83083ffafa88 ffff82c480170e4b 8b13687800000000 7d4d94af0000009f (XEN) 0000000100007ff0 0000000000000000 00000000000033c0 ffff82c48015a3ff (XEN) 0000000000000202 ffff83183f7a3000 ffff82c4801265c1 0000000000000008 (XEN) 0000000000000007 fffff88002fe3b80 ffff8300bf2ea000 ffff82c4801b5344 (XEN) ffff83083ffafc78 ffff82c480114c72 0000000000000286 ffff83083ffafab8 (XEN) ffff82c4801265c1 ffff83183f6a19a0 ffff83083ffafae8 ffff82c48012daa1 (XEN) ffff83083ffafb18 0000000000000040 ffff83183f6a1990 ffff83183f7a3000 (XEN) ffff83083ffafb18 ffff82c48012dbc5 0000000000000040 ffff83083ffeffd0 (XEN) ffff83183f7a3000 ffff82c4803099c0 ffff83083ffafb68 ffff82c480105e63 (XEN) 0000000000000003 0000000000000000 ffff83183f6a19d0 ffff83083ffb6048 (XEN) ffff83183f7a1bb0 0000000000000001 0000000000000001 0000000000000001 (XEN) ffff83083ffafbf8 ffff82c4801e6f0d 00000000000000ff 0000000300000000 (XEN) ffff83083ffafbc8 ffff83083ffafbc0 0000000000000000 ffff83083ffafcd8 (XEN) Xen call trace: (XEN) [<ffff82c4801e08e2>] __get_gfn_type_access+0x230/0x3ee (XEN) [<ffff82c480170b91>] xenmem_add_to_physmap_once+0x610/0x773 (XEN) [<ffff82c480170e4b>] arch_memory_op+0x157/0xc7a (XEN) [<ffff82c480114c72>] do_memory_op+0x1d60/0x1dce (XEN) [<ffff82c4801b5409>] hvm_memory_op+0x62/0x6a (XEN) [<ffff82c4801b84f1>] hvm_do_hypercall+0x1af/0x2f5 (XEN) [<ffff82c4801d73a5>] vmx_vmexit_handler+0xef3/0x1849 (XEN) (XEN) domain_crash called from p2m.c:200 (XEN) Xen WARN at p2m.c:200 (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 4 (XEN) RIP: e008:[<ffff82c4801e08e2>] __get_gfn_type_access+0x230/0x3ee (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (XEN) rax: 0000000000000001 rbx: 000000000000000b rcx: ffff83083ffaff18 (XEN) rdx: 00000043bfcac680 rsi: 0000000000000000 rdi: 000000183f775000 (XEN) rbp: ffff83083ffaf9a8 rsp: ffff83083ffaf928 r8: 0000000000000001 (XEN) r9: 0000000000000000 r10: 0000000000000003 r11: 0000000000000003 (XEN) r12: ffff83183f7a1bb0 r13: ffff83083ffaf9dc r14: 0000000000000000 (XEN) r15: 00000000000033c0 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 000000183f793000 cr2: 000000007724f4df (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff83083ffaf928: (XEN) ffff83083ffaff18 ffff83083ffaff18 0000000000000000 000000018012635a (XEN) ffff83083ffaf9d8 ffff83083ffaff18 ffff82c48030a760 ffff82c48030a760 (XEN) ffff82c48030a760 ffff82c48030a760 01ff82c4801264b3 0000000000000000 (XEN) ffff83183f7a3000 ffff83083ffafa28 0000000000000000 000000000183f7b9 (XEN) ffff83083ffafa08 ffff82c480170b91 ffff830000000001 ffff83183f7a3010 (XEN) 0000000000000000 ffff82c4801b3d94 0000000b00000007 ffff83183f7a3000 (XEN) fffff88002fe3b80 fffff88002fe3b80 ffff8300bf2ea000 ffff82c4801b5344 (XEN) ffff83083ffafa88 ffff82c480170e4b 8b13687800000000 7d4d94af0000009f (XEN) 0000000100007ff0 0000000000000000 00000000000033c0 ffff82c48015a3ff (XEN) 0000000000000202 ffff83183f7a3000 ffff82c4801265c1 0000000000000008 (XEN) 0000000000000007 fffff88002fe3b80 ffff8300bf2ea000 ffff82c4801b5344 (XEN) ffff83083ffafc78 ffff82c480114c72 0000000000000286 ffff83083ffafab8 (XEN) ffff82c4801265c1 ffff83183f6a19a0 ffff83083ffafae8 ffff82c48012daa1 (XEN) ffff83083ffafb18 0000000000000040 ffff83183f6a1990 ffff83183f7a3000 (XEN) ffff83083ffafb18 ffff82c48012dbc5 0000000000000040 ffff83083ffeffd0 (XEN) ffff83183f7a3000 ffff82c4803099c0 ffff83083ffafb68 ffff82c480105e63 (XEN) 0000000000000003 0000000000000000 ffff83183f6a19d0 ffff83083ffb6048 (XEN) ffff83183f7a1bb0 0000000000000001 0000000000000001 0000000000000001 (XEN) ffff83083ffafbf8 ffff82c4801e6f0d 00000000000000ff 0000000300000000 (XEN) ffff83083ffafbc8 ffff83083ffafbc0 0000000000000000 ffff83083ffafcd8 (XEN) Xen call trace: (XEN) [<ffff82c4801e08e2>] __get_gfn_type_access+0x230/0x3ee (XEN) [<ffff82c480170b91>] xenmem_add_to_physmap_once+0x610/0x773 (XEN) [<ffff82c480170e4b>] arch_memory_op+0x157/0xc7a (XEN) [<ffff82c480114c72>] do_memory_op+0x1d60/0x1dce (XEN) [<ffff82c4801b5409>] hvm_memory_op+0x62/0x6a (XEN) [<ffff82c4801b84f1>] hvm_do_hypercall+0x1af/0x2f5 (XEN) [<ffff82c4801d73a5>] vmx_vmexit_handler+0xef3/0x1849 (XEN) (XEN) domain_crash called from p2m.c:200>> >> >> # HG changeset patch >> # Parent c3738598897f5239a72cabde676f5e86fd4c8241 >> >> diff -r c3738598897f xen/arch/x86/hvm/hvm.c >> --- a/xen/arch/x86/hvm/hvm.c >> +++ b/xen/arch/x86/hvm/hvm.c >> @@ -999,6 +999,10 @@ int hvm_vcpu_initialise(struct vcpu *v) >> >> v->arch.hvm_vcpu.inject_trap = -1; >> >> +#ifdef CONFIG_X86_64 >> + init_waitqueue_head(&v->arch.hvm_vcpu.mem_paging_wq); >> +#endif >> + >> #ifdef CONFIG_COMPAT >> rc = setup_compat_arg_xlat(v); >> if ( rc != 0 ) >> diff -r c3738598897f xen/arch/x86/mm/p2m.c >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -183,8 +183,8 @@ again: >> p2m_mem_paging_populate(p2m->domain, gfn); >> >> /* Wait until the pager finishes paging it in */ >> - current->arch.mem_paging_gfn = gfn; >> - wait_event(current->arch.mem_paging_wq, ({ >> + current->arch.hvm_vcpu.mem_paging_gfn = gfn; >> + wait_event(current->arch.hvm_vcpu.mem_paging_wq, ({ >> int done; >> mfn = p2m->get_entry(p2m, gfn, t, a, 0, page_order); >> done = (*t != p2m_ram_paging_in); >> @@ -1190,8 +1190,8 @@ void p2m_mem_paging_resume(struct domain >> } >> /* Wake any vcpus that were waiting for this GFN */ >> for_each_vcpu ( d, v ) >> - if ( v->arch.mem_paging_gfn == rsp.gfn ) >> - wake_up_all(&v->arch.mem_paging_wq); >> + if ( v->arch.hvm_vcpu.mem_paging_gfn == rsp.gfn ) >> + wake_up_all(&v->arch.hvm_vcpu.mem_paging_wq); >> } >> } >> >> diff -r c3738598897f xen/include/asm-x86/domain.h >> --- a/xen/include/asm-x86/domain.h >> +++ b/xen/include/asm-x86/domain.h >> @@ -494,12 +494,6 @@ struct arch_vcpu >> >> struct paging_vcpu paging; >> >> -#ifdef CONFIG_X86_64 >> - /* Mem-paging: this vcpu is waiting for a gfn to be paged in */ >> - struct waitqueue_head mem_paging_wq; >> - unsigned long mem_paging_gfn; >> -#endif >> - >> #ifdef CONFIG_X86_32 >> /* map_domain_page() mapping cache. */ >> struct mapcache_vcpu mapcache; >> diff -r c3738598897f xen/include/asm-x86/hvm/vcpu.h >> --- a/xen/include/asm-x86/hvm/vcpu.h >> +++ b/xen/include/asm-x86/hvm/vcpu.h >> @@ -170,6 +170,13 @@ struct hvm_vcpu { >> unsigned long inject_cr2; >> >> struct viridian_vcpu viridian; >> + >> +#ifdef CONFIG_X86_64 >> + /* Mem-paging: this vcpu is waiting for a gfn to be paged in */ >> + struct waitqueue_head mem_paging_wq; >> + unsigned long mem_paging_gfn; >> +#endif >> + >> }; >> >> #endif /* __ASM_X86_HVM_VCPU_H__ */ > >
On Tue, Feb 28, Andres Lagar-Cavilla wrote:> > > > On Feb 27, 2012, at 11:51 AM, Olaf Hering wrote: > > > >> On Sun, Feb 26, Olaf Hering wrote: > >> > >>> On Thu, Feb 23, Tim Deegan wrote: > >>> > >>>> This is v2 of the patch I posted last week, after feedback from > >>>> Andres. > >>> > >>> Tried this series, but processes in dom0 started to hang in D state > >>> when a paged guest is started. I will see if I can spot the error. > >> > >> This change for patch #5 is needed, especially the first part. > >> Now it appears to work. > >> > >> Olaf > > Unfortunately, I get all kinds of borking on Win7 domains with Citrix PV > drivers 6.0. So it''s a no-go from my end for now.This is the domain_lock() in xenmem_add_to_physmap_once(). Is get_gfn_untyped() correct, or would get_gfn_query() work as well in this context? Olaf
On Wed, Feb 29, Olaf Hering wrote:> This is the domain_lock() in xenmem_add_to_physmap_once(). > > Is get_gfn_untyped() correct, or would get_gfn_query() work as well in > this context?Another case is emulate_privileged_op(), in "Write CR3" case get_gfn_untyped() is called with domain_lock(). Same for XENMEM_remove_from_physmap. These are the places I found by a quick grep for get_gfn_untyped, get_gfn and get_gfn_unshare. Olaf
At 20:56 +0100 on 29 Feb (1330548983), Olaf Hering wrote:> On Wed, Feb 29, Olaf Hering wrote: > > > This is the domain_lock() in xenmem_add_to_physmap_once(). > > > > Is get_gfn_untyped() correct, or would get_gfn_query() work as well in > > this context?get_gfn_untyped() is correct. I''m not sure that we really need to take the domain lock while we''re doing it, though. It might be that the new gfn locks will be enough to serialize these updates.> Another case is emulate_privileged_op(), in "Write CR3" case > get_gfn_untyped() is called with domain_lock().I think that should be OK as it only happens for PV guests.> Same for XENMEM_remove_from_physmap.That can almost certainly be done without the domain lock. Sadly I haven''t got time to fix these all up today. I have committed patches 1 to 4 of this series, and made some of the changes suggested to the other two but there seems little point in reposting them until we can sort out the locking. Tim.
On Thu, Mar 15, Tim Deegan wrote:> At 20:56 +0100 on 29 Feb (1330548983), Olaf Hering wrote: > > On Wed, Feb 29, Olaf Hering wrote: > > > > > This is the domain_lock() in xenmem_add_to_physmap_once(). > > > > > > Is get_gfn_untyped() correct, or would get_gfn_query() work as well in > > > this context? > > get_gfn_untyped() is correct. I''m not sure that we really need to take > the domain lock while we''re doing it, though. It might be that the new > gfn locks will be enough to serialize these updates. > > > Another case is emulate_privileged_op(), in "Write CR3" case > > get_gfn_untyped() is called with domain_lock(). > > I think that should be OK as it only happens for PV guests.I think its just me not knowing these things, but would it make sense to add comments to XX_lock users describing what they protect (or used to protect since comments can stale over time)? Olaf
At 16:37 +0100 on 15 Mar (1331829432), Olaf Hering wrote:> On Thu, Mar 15, Tim Deegan wrote: > > > At 20:56 +0100 on 29 Feb (1330548983), Olaf Hering wrote: > > > On Wed, Feb 29, Olaf Hering wrote: > > > > > > > This is the domain_lock() in xenmem_add_to_physmap_once(). > > > > > > > > Is get_gfn_untyped() correct, or would get_gfn_query() work as well in > > > > this context? > > > > get_gfn_untyped() is correct. I''m not sure that we really need to take > > the domain lock while we''re doing it, though. It might be that the new > > gfn locks will be enough to serialize these updates. > > > > > Another case is emulate_privileged_op(), in "Write CR3" case > > > get_gfn_untyped() is called with domain_lock(). > > > > I think that should be OK as it only happens for PV guests. > > I think its just me not knowing these things, but would it make sense to > add comments to XX_lock users describing what they protect (or used to > protect since comments can stale over time)?Yes! I''m trying to get this sorted out for all the MM locks, as well as the discipline of which order to take them in. But the domain lock (a.k.a the Big Lock) is an ancient and venerable lock, now poorly understood, at least by me. :) The add_to_physmap path started taking this lock in cset 9187:fbeb0a5b7219, in 2006. Tim.
Andres Lagar-Cavilla
2012-Mar-15 15:56 UTC
Re: [PATCH 0 of 6] [RFC] Use wait queues for paging, v2
> At 16:37 +0100 on 15 Mar (1331829432), Olaf Hering wrote: >> On Thu, Mar 15, Tim Deegan wrote: >> >> > At 20:56 +0100 on 29 Feb (1330548983), Olaf Hering wrote: >> > > On Wed, Feb 29, Olaf Hering wrote: >> > > >> > > > This is the domain_lock() in xenmem_add_to_physmap_once(). >> > > > >> > > > Is get_gfn_untyped() correct, or would get_gfn_query() work as >> well in >> > > > this context? >> > >> > get_gfn_untyped() is correct. I''m not sure that we really need to >> take >> > the domain lock while we''re doing it, though. It might be that the >> new >> > gfn locks will be enough to serialize these updates. >> > >> > > Another case is emulate_privileged_op(), in "Write CR3" case >> > > get_gfn_untyped() is called with domain_lock(). >> > >> > I think that should be OK as it only happens for PV guests. >> >> I think its just me not knowing these things, but would it make sense to >> add comments to XX_lock users describing what they protect (or used to >> protect since comments can stale over time)? > > Yes! I''m trying to get this sorted out for all the MM locks, as well as > the discipline of which order to take them in. But the domain lock > (a.k.a the Big Lock) is an ancient and venerable lock, now poorly > understood, at least by me. :) The add_to_physmap path started taking > this lock in cset 9187:fbeb0a5b7219, in 2006.We''ve come a long way in terms of discipline for the mm. But there is still the odd interaction -- see recent posts about deadlock between p2m and event channel lock. And the big domain lock is surely a big rug thrown over a lot of painful dirt ("metaphors, not Andres''s forte") Even more venerable and hideous is the domctl lock. It''s actually somewhat surprising the switch to memops for paging/sharing worked rather seamlessly. Andres> > Tim. > >