Andres Lagar-Cavilla
2012-Apr-24 19:34 UTC
[PATCH 0 of 3] x86/mm: Relieve contention of p2m lock on three hot paths
This series is motivated by recent scalability issues reported by Yhan Z Zang from Intel. The general pattern used is to perform a synchronized p2m query to take care of paging in, PoD allocation or unsharing, and then take a reference to the underlying page. The reference ensures the page can be safely mapped by the caller as it won''t be paged out. The p2m lock can then be relased and not held while actual work is performed on the page. This reduces p2m locked critical sections to a minimum. The pattern is applied to page table walking and emulation of rep movs. By checking for the bogus zero ram_gpa value on the general emulation function, we also reduce contention derived from the p2m lock for that hot path. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> xen/arch/x86/mm/hap/guest_walk.c | 6 ++++- xen/arch/x86/hvm/emulate.c | 27 +++++++--------------- xen/arch/x86/hvm/emulate.c | 48 +++++++++++++++++++++------------------ 3 files changed, 40 insertions(+), 41 deletions(-)
Andres Lagar-Cavilla
2012-Apr-24 19:34 UTC
[PATCH 1 of 3] x86/mm: Relieve contention for p2m lock in gva_to_gfn
xen/arch/x86/mm/hap/guest_walk.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) We don''t need to hold the p2m lock for the duration of the guest walk. We need to ensure livenes of the top level page. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 34c7e6be9265 -r 58fd70123787 xen/arch/x86/mm/hap/guest_walk.c --- a/xen/arch/x86/mm/hap/guest_walk.c +++ b/xen/arch/x86/mm/hap/guest_walk.c @@ -52,6 +52,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA { uint32_t missing; mfn_t top_mfn; + struct page_info *top_page; void *top_map; p2m_type_t p2mt; p2m_access_t p2ma; @@ -85,13 +86,16 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA /* Map the top-level table and call the tree-walker */ ASSERT(mfn_valid(mfn_x(top_mfn))); + top_page = mfn_to_page(mfn_x(top_mfn)); + ASSERT(get_page(top_page, p2m->domain)); + __put_gfn(p2m, top_gfn); top_map = map_domain_page(mfn_x(top_mfn)); #if GUEST_PAGING_LEVELS == 3 top_map += (cr3 & ~(PAGE_MASK | 31)); #endif missing = guest_walk_tables(v, p2m, ga, &gw, pfec[0], top_mfn, top_map); unmap_domain_page(top_map); - __put_gfn(p2m, top_gfn); + put_page(top_page); /* Interpret the answer */ if ( missing == 0 )
Andres Lagar-Cavilla
2012-Apr-24 19:34 UTC
[PATCH 2 of 3] x86/emulate: Relieve contention of p2m lock in emulation of rep movs
xen/arch/x86/hvm/emulate.c | 27 +++++++++------------------ 1 files changed, 9 insertions(+), 18 deletions(-) get_two_gfns is used to query the source and target physical addresses of the emulated rep movs. It is not necessary to hold onto the two gfn''s for the duration of the emulation: further calls down the line will do the appropriate unsharing or paging in, and unwind correctly on failure. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 58fd70123787 -r 2ffc676120b8 xen/arch/x86/hvm/emulate.c --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -714,25 +714,23 @@ static int hvmemul_rep_movs( if ( rc != X86EMUL_OKAY ) return rc; + /* The query on the gfns is to establish the need for mmio. In the two mmio + * cases, a proper get_gfn for the "other" gfn will be enacted, with paging in + * or unsharing if necessary. In the memmove case, the gfn might change given + * the bytes mov''ed, and, again, proper get_gfn''s will be enacted in + * __hvm_copy. */ get_two_gfns(current->domain, sgpa >> PAGE_SHIFT, &sp2mt, NULL, NULL, current->domain, dgpa >> PAGE_SHIFT, &dp2mt, NULL, NULL, P2M_ALLOC, &tg); - + put_two_gfns(&tg); + if ( !p2m_is_ram(sp2mt) && !p2m_is_grant(sp2mt) ) - { - rc = hvmemul_do_mmio( + return hvmemul_do_mmio( sgpa, reps, bytes_per_rep, dgpa, IOREQ_READ, df, NULL); - put_two_gfns(&tg); - return rc; - } if ( !p2m_is_ram(dp2mt) && !p2m_is_grant(dp2mt) ) - { - rc = hvmemul_do_mmio( + return hvmemul_do_mmio( dgpa, reps, bytes_per_rep, sgpa, IOREQ_WRITE, df, NULL); - put_two_gfns(&tg); - return rc; - } /* RAM-to-RAM copy: emulate as equivalent of memmove(dgpa, sgpa, bytes). */ bytes = *reps * bytes_per_rep; @@ -747,10 +745,7 @@ static int hvmemul_rep_movs( * can be emulated by a source-to-buffer-to-destination block copy. */ if ( ((dgpa + bytes_per_rep) > sgpa) && (dgpa < (sgpa + bytes)) ) - { - put_two_gfns(&tg); return X86EMUL_UNHANDLEABLE; - } /* Adjust destination address for reverse copy. */ if ( df ) @@ -759,10 +754,7 @@ static int hvmemul_rep_movs( /* Allocate temporary buffer. Fall back to slow emulation if this fails. */ buf = xmalloc_bytes(bytes); if ( buf == NULL ) - { - put_two_gfns(&tg); return X86EMUL_UNHANDLEABLE; - } /* * We do a modicum of checking here, just for paranoia''s sake and to @@ -773,7 +765,6 @@ static int hvmemul_rep_movs( rc = hvm_copy_to_guest_phys(dgpa, buf, bytes); xfree(buf); - put_two_gfns(&tg); if ( rc == HVMCOPY_gfn_paged_out ) return X86EMUL_RETRY;
Andres Lagar-Cavilla
2012-Apr-24 19:34 UTC
[PATCH 3 of 3] x86/emulation: No need to get_gfn on zero ram_gpa
xen/arch/x86/hvm/emulate.c | 48 ++++++++++++++++++++++++--------------------- 1 files changed, 26 insertions(+), 22 deletions(-) Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 2ffc676120b8 -r 7a7443e80b99 xen/arch/x86/hvm/emulate.c --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -60,33 +60,37 @@ static int hvmemul_do_io( ioreq_t *p = get_ioreq(curr); unsigned long ram_gfn = paddr_to_pfn(ram_gpa); p2m_type_t p2mt; - mfn_t ram_mfn; + mfn_t ram_mfn = _mfn(INVALID_MFN); int rc; - /* Check for paged out page */ - ram_mfn = get_gfn_unshare(curr->domain, ram_gfn, &p2mt); - if ( p2m_is_paging(p2mt) ) - { - put_gfn(curr->domain, ram_gfn); - p2m_mem_paging_populate(curr->domain, ram_gfn); - return X86EMUL_RETRY; - } - if ( p2m_is_shared(p2mt) ) - { - put_gfn(curr->domain, ram_gfn); - return X86EMUL_RETRY; - } - - /* Maintain a ref on the mfn to ensure liveness. Put the gfn - * to avoid potential deadlock wrt event channel lock, later. */ - if ( mfn_valid(mfn_x(ram_mfn)) ) - if ( !get_page(mfn_to_page(mfn_x(ram_mfn)), - curr->domain) ) + /* Many callers pass a stub zero ram_gpa address. */ + if ( ram_gfn != 0 ) + { + /* Check for paged out page */ + ram_mfn = get_gfn_unshare(curr->domain, ram_gfn, &p2mt); + if ( p2m_is_paging(p2mt) ) { - put_gfn(curr->domain, ram_gfn); + put_gfn(curr->domain, ram_gfn); + p2m_mem_paging_populate(curr->domain, ram_gfn); return X86EMUL_RETRY; } - put_gfn(curr->domain, ram_gfn); + if ( p2m_is_shared(p2mt) ) + { + put_gfn(curr->domain, ram_gfn); + return X86EMUL_RETRY; + } + + /* Maintain a ref on the mfn to ensure liveness. Put the gfn + * to avoid potential deadlock wrt event channel lock, later. */ + if ( mfn_valid(mfn_x(ram_mfn)) ) + if ( !get_page(mfn_to_page(mfn_x(ram_mfn)), + curr->domain) ) + { + put_gfn(curr->domain, ram_gfn); + return X86EMUL_RETRY; + } + put_gfn(curr->domain, ram_gfn); + } /* * Weird-sized accesses have undefined behaviour: we discard writes
Tim Deegan
2012-Apr-25 12:51 UTC
Re: [PATCH 1 of 3] x86/mm: Relieve contention for p2m lock in gva_to_gfn
At 15:34 -0400 on 24 Apr (1335281651), Andres Lagar-Cavilla wrote:> xen/arch/x86/mm/hap/guest_walk.c | 6 +++++- > 1 files changed, 5 insertions(+), 1 deletions(-) > > > We don''t need to hold the p2m lock for the duration of the guest walk. We need > to ensure livenes of the top level page.I''m not sure I want to start taking these s/gfn-lock/get-page/ changes piecemeal without a clear understanding of why the gfn-locking is not needed. My understanding was that - get_page protects against the page being freed and reused. - gfn-lock protects against the GFN being reused, or the page being reused within the VM. Are there some cases where we only care about the first of these and some where we care about both? In other words, can we replace the gfn-locking everywhere with get_page/put_page (i.e. get rid of put_gfn)? Also:> @@ -85,13 +86,16 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA > > /* Map the top-level table and call the tree-walker */ > ASSERT(mfn_valid(mfn_x(top_mfn))); > + top_page = mfn_to_page(mfn_x(top_mfn)); > + ASSERT(get_page(top_page, p2m->domain));ASSERT(foo) is compiled out on non-debug builds, so that''s definitely not what you want. I personally dislike this idiom with BUG_ON() as well, because even though BUG_ON(some side-effecting statement) may be correct, my eye tends to skip over it when skimming code. Cheers, Tim.
Tim Deegan
2012-Apr-25 12:54 UTC
Re: [PATCH 2 of 3] x86/emulate: Relieve contention of p2m lock in emulation of rep movs
At 15:34 -0400 on 24 Apr (1335281652), Andres Lagar-Cavilla wrote:> xen/arch/x86/hvm/emulate.c | 27 +++++++++------------------ > 1 files changed, 9 insertions(+), 18 deletions(-) > > > get_two_gfns is used to query the source and target physical addresses of the > emulated rep movs. It is not necessary to hold onto the two gfn''s for the > duration of the emulation: further calls down the line will do the appropriate > unsharing or paging in, and unwind correctly on failure. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > diff -r 58fd70123787 -r 2ffc676120b8 xen/arch/x86/hvm/emulate.c > --- a/xen/arch/x86/hvm/emulate.c > +++ b/xen/arch/x86/hvm/emulate.c > @@ -714,25 +714,23 @@ static int hvmemul_rep_movs( > if ( rc != X86EMUL_OKAY ) > return rc; > > + /* The query on the gfns is to establish the need for mmio. In the two mmio > + * cases, a proper get_gfn for the "other" gfn will be enacted, with paging in > + * or unsharing if necessary. In the memmove case, the gfn might change given > + * the bytes mov''ed, and, again, proper get_gfn''s will be enacted in > + * __hvm_copy. */ > get_two_gfns(current->domain, sgpa >> PAGE_SHIFT, &sp2mt, NULL, NULL, > current->domain, dgpa >> PAGE_SHIFT, &dp2mt, NULL, NULL, > P2M_ALLOC, &tg); > - > + put_two_gfns(&tg);If we''re just going to put_ these immediately, we don''t need to use get_two_gfns(). Tim.
Tim Deegan
2012-Apr-25 13:11 UTC
Re: [PATCH 3 of 3] x86/emulation: No need to get_gfn on zero ram_gpa
At 15:34 -0400 on 24 Apr (1335281653), Andres Lagar-Cavilla wrote:> xen/arch/x86/hvm/emulate.c | 48 ++++++++++++++++++++++++--------------------- > 1 files changed, 26 insertions(+), 22 deletions(-) > > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > diff -r 2ffc676120b8 -r 7a7443e80b99 xen/arch/x86/hvm/emulate.c > --- a/xen/arch/x86/hvm/emulate.c > +++ b/xen/arch/x86/hvm/emulate.c > @@ -60,33 +60,37 @@ static int hvmemul_do_io( > ioreq_t *p = get_ioreq(curr); > unsigned long ram_gfn = paddr_to_pfn(ram_gpa); > p2m_type_t p2mt; > - mfn_t ram_mfn; > + mfn_t ram_mfn = _mfn(INVALID_MFN); > int rc; > > - /* Check for paged out page */ > - ram_mfn = get_gfn_unshare(curr->domain, ram_gfn, &p2mt); > - if ( p2m_is_paging(p2mt) ) > - { > - put_gfn(curr->domain, ram_gfn); > - p2m_mem_paging_populate(curr->domain, ram_gfn); > - return X86EMUL_RETRY; > - } > - if ( p2m_is_shared(p2mt) ) > - { > - put_gfn(curr->domain, ram_gfn); > - return X86EMUL_RETRY; > - } > - > - /* Maintain a ref on the mfn to ensure liveness. Put the gfn > - * to avoid potential deadlock wrt event channel lock, later. */ > - if ( mfn_valid(mfn_x(ram_mfn)) ) > - if ( !get_page(mfn_to_page(mfn_x(ram_mfn)), > - curr->domain) ) > + /* Many callers pass a stub zero ram_gpa address. */ > + if ( ram_gfn != 0 )To safely gate on this, the ''stub'' value needs to be made into something that can''t be confused with a real paddr, say, ragm_gpa == -1. Otherwise we lose protection for IO where the target is in page zero.> + { > + /* Check for paged out page */ > + ram_mfn = get_gfn_unshare(curr->domain, ram_gfn, &p2mt); > + if ( p2m_is_paging(p2mt) ) > { > - put_gfn(curr->domain, ram_gfn); > + put_gfn(curr->domain, ram_gfn); > + p2m_mem_paging_populate(curr->domain, ram_gfn); > return X86EMUL_RETRY; > } > - put_gfn(curr->domain, ram_gfn); > + if ( p2m_is_shared(p2mt) ) > + { > + put_gfn(curr->domain, ram_gfn); > + return X86EMUL_RETRY; > + } > + > + /* Maintain a ref on the mfn to ensure liveness. Put the gfn > + * to avoid potential deadlock wrt event channel lock, later. */ > + if ( mfn_valid(mfn_x(ram_mfn)) ) > + if ( !get_page(mfn_to_page(mfn_x(ram_mfn)), > + curr->domain) ) > + { > + put_gfn(curr->domain, ram_gfn); > + return X86EMUL_RETRY; > + } > + put_gfn(curr->domain, ram_gfn); > + } > > /* > * Weird-sized accesses have undefined behaviour: we discard writes > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Andres Lagar-Cavilla
2012-Apr-25 15:33 UTC
Re: [PATCH 1 of 3] x86/mm: Relieve contention for p2m lock in gva_to_gfn
> At 15:34 -0400 on 24 Apr (1335281651), Andres Lagar-Cavilla wrote: >> xen/arch/x86/mm/hap/guest_walk.c | 6 +++++- >> 1 files changed, 5 insertions(+), 1 deletions(-) >> >> >> We don''t need to hold the p2m lock for the duration of the guest walk. >> We need >> to ensure livenes of the top level page. > > I''m not sure I want to start taking these s/gfn-lock/get-page/ changes > piecemeal without a clear understanding of why the gfn-locking is not > needed. My understanding was that > > - get_page protects against the page being freed and reused. > - gfn-lock protects against the GFN being reused, or the page > being reused within the VM.Most cases gfn_lock protects against are already taken care of by the end of a query. From there on, get_page will protect you against paging out. gfn_lock will further protect the caller from the gfn being replaced. That is the risk we are taking in these code paths, and the assumption is that the VM would not try that on its own pagetables, GDTs, or mmio targets. If so, Xen won''t misbehave, yet the VM will shoot itself in the foot. Deservedly. gfn/p2m lock will also prevent modifications elsewhere in the p2m, and that is good motivation for a fine-grained lock. These code paths don''t care about modifications elsewhere in the p2m. I agree it''s rather ad hoc, and I''m not happy about it. I would prefer a proper construct that buries all details, or finding out if rwlocks are feasible. I''m trying to find ways to relieve contention and address Yang''s problem before the window closes. It hasn''t seem that successful anyways. Andres> > Are there some cases where we only care about the first of these and > some where we care about both? In other words, can we replace the > gfn-locking everywhere with get_page/put_page (i.e. get rid of put_gfn)? > > Also: > >> @@ -85,13 +86,16 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA >> >> /* Map the top-level table and call the tree-walker */ >> ASSERT(mfn_valid(mfn_x(top_mfn))); >> + top_page = mfn_to_page(mfn_x(top_mfn)); >> + ASSERT(get_page(top_page, p2m->domain)); > > ASSERT(foo) is compiled out on non-debug builds, so that''s definitely > not what you want. > > I personally dislike this idiom with BUG_ON() as well, because even > though BUG_ON(some side-effecting statement) may be correct, my eye > tends to skip over it when skimming code. > > Cheers, > > Tim. >