thr3ads.net - Xen devel - [Xen-devel] mapping problems in xenpaging [Sep 2011]

If this information is useful, please help other people find it:
Share via:

zhen shi

2011-Sep-29 14:55 UTC

[Xen-devel] mapping problems in xenpaging

Hi,Olaf,

 When we analyze and test xenpaging,we found there are some problems between
mapping and xenpaging.
 1) When mapping firstly, then do xenpaging,and the code paths have resolved
the problems.It''s OK.
 2) The problems exists if we do address mapping firstly then go to
xenpaging,and our confusions are as followings:
   a) If the domU''s memory is directly mapped to dom0,such as the
hypercall
from pv driver,then it will build a related page-table in dom0,which will
not change p2m-type.
      and then do the xenpaging to page out the domU''s memory pages
whose
gfn address have been already mapped to dom0;So it will cause some problems
when dom0
      accesses these pages.Because these pages are paged-out,and dom0 cannot
tell the p2mt before access the pages.
  b)The another situation is that if xen has mapped the domU''s page,
and get
the mfn according to pfn_to_mfn.But then the page''s p2mt is changed by
others, so when xen
    accesses the page ,it will cause problems such as BSOD or reboot.Because
the operations of getting mfn and accessing the page are not
atomic.and the situation exists
    in many code paths .
   According to the above-mentioned points,do you have any suggestions about
what to do to avoid these situations.We have thought these two problems,but
currently have no
  good method to resolve.

  I am looking forward to hearing from you. Thank you very much!  :)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Sep-29 17:02 UTC

head link

[Xen-devel] Re: mapping problems in xenpaging

On Thu, Sep 29, zhen shi wrote:
>  Hi,Olaf,
>  
>  When we analyze and test xenpaging,we found there are some
problems between
> mapping and xenpaging.
>  1) When mapping firstly, then do xenpaging,and the code paths have
resolved
> the problems.It''s OK.
>  2) The problems exists if we do address mapping firstly then go to
> xenpaging,and our confusions are as followings:
>    a) If the domU''s memory is directly mapped to dom0,such as the
hypercall
> from pv driver,then it will build a related page-table in dom0,which will
not
> change p2m-type.
>       and then do the xenpaging to page out the domU''s memory
pages whose gfn
> address have been already mapped to dom0;So it will cause some problems
when
> dom0
>       accesses these pages.Because these pages are paged-out,and dom0
cannot
> tell the p2mt before access the pages.
I''m not entirely sure what you do. xenpaging runs in dom0 and is able
to
map paged-out pages. It uses that to trigger a page-in, see
tools/xenpaging/pagein.c in xen-unstable.hg
>   b)The another situation is that if xen has mapped the domU''s
page, and get
> the mfn according to pfn_to_mfn.But then the page''s p2mt is
changed by others,
> so when xen
>     accesses the page ,it will cause problems such as BSOD or
reboot.Because
> the operations of getting mfn and accessing the page are not
> atomic.and the situation exists
>     in many code paths .
Can you be more specific what you mean? Xen doesnt seem to have a
pfn_to_mfn function, only the tools have some helper macros of that name.


Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Adin Scannell

2011-Sep-30 21:02 UTC

head link

Re: [Xen-devel] Re: mapping problems in xenpaging

>>  When we analyze and test xenpaging,we found there are some
problems between
>> mapping and xenpaging.
>>  1) When mapping firstly, then do xenpaging,and the code paths have
resolved
>> the problems.It''s OK.
>>  2) The problems exists if we do address mapping firstly then go to
>> xenpaging,and our confusions are as followings:
>>    a) If the domU''s memory is directly mapped to dom0,such as
the hypercall
>> from pv driver,then it will build a related page-table in dom0,which
will not
>> change p2m-type.
>>       and then do the xenpaging to page out the domU''s memory
pages whose gfn
>> address have been already mapped to dom0;So it will cause some problems
when
>> dom0
>>       accesses these pages.Because these pages are paged-out,and dom0
cannot
>> tell the p2mt before access the pages.
>
> I''m not entirely sure what you do. xenpaging runs in dom0 and is
able to
> map paged-out pages. It uses that to trigger a page-in, see
> tools/xenpaging/pagein.c in xen-unstable.hg
Here''s my take...

Xenpaging doesn''t allow selection of pages that have been mapped by
other domains (as in p2m.c):

 669 int p2m_mem_paging_nominate(struct domain *d, unsigned long gfn)
....
 693     /* Check page count and type */
 694     page = mfn_to_page(mfn);
 695     if ( (page->count_info & (PGC_count_mask | PGC_allocated)) ! 696
(1 | PGC_allocated) )
 697         goto out;

*However*, I think that the problem Zhen is describing still exists:
1) xenpaging nominates a page, it is successful.
2) dom0 maps the same page (a process other than xenpaging, which will
also map it).
3) xenpaging evicts the page, successfully.

I''ve experienced a few nasty crashes, and I think this could account
for a couple (but certainly not all)... I think that the solution may
be to repeat the refcount check in paging_evict, and roll back the
nomination gracefully if the race is detected. Thoughts?
>>   b)The another situation is that if xen has mapped the domU''s
page, and get
>> the mfn according to pfn_to_mfn.But then the page''s p2mt is
changed by others,
>> so when xen
>>     accesses the page ,it will cause problems such as BSOD or
reboot.Because
>> the operations of getting mfn and accessing the page are not
>> atomic.and the situation exists
>>     in many code paths .
I believe I have recreated this problem a few times, resulting in
various crashes... unfortunately, there is somewhat of an implicit
assumption throughout the code that when you grab an mfn via
gfn_to_mfn, that mfn won''t disappear underneath you (for example, see
vmx_load_pdptrs).  Really, you want something like gfn_to_mfn_getpage,
where the underlying page has its refcount bumped so that it won''t be
nominated/evicted while you map and use the page, then you must put it
back when you''re done.  I hope to look into helping fix some of these
paging bugs soon.

Cheers,
-Adin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2011-Sep-30 22:19 UTC

head link

Re: [Xen-devel] Re: mapping problems in xenpaging

At 17:02 -0400 on 30 Sep (1317402151), Adin Scannell
wrote:> I believe I have recreated this problem a few times, resulting in
> various crashes... unfortunately, there is somewhat of an implicit
> assumption throughout the code that when you grab an mfn via
> gfn_to_mfn, that mfn won''t disappear underneath you (for example,
see
> vmx_load_pdptrs).  Really, you want something like gfn_to_mfn_getpage,
> where the underlying page has its refcount bumped so that it won''t
be
> nominated/evicted while you map and use the page, then you must put it
> back when you''re done.  
Quite right - there are a lot of places that assume that a p2m mapping
won''t change underfoot, and the right answer will involve reference
counting of some kind - either better integration with the underlying
refcount/typecount system or some reference count in the p2m. 

The tricky part will be what to do when a p2m update can''t be made
because of one of those refcounts. 

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

zhen shi

2011-Oct-01 03:52 UTC

head link

[Xen-devel] Re: mapping problems in xenpaging

Olaf,maybe I didn''t described the problems clearly.I will give an
example.
a) In xen_vga_vram_map() of vga.c from tools/ioemu-qemu-xen/hw,  it uses
xc_map_foreign_pages() to map a page''s gfn address to dom0. If then the
page
is paged out and changed to zero page in xenpaging, and dom0 access the page
such as using the mapped address, it will make mistakes.Am I right?
In brief,I mean there may be some conflicts in xc_map_foreign_pages from
other functions and xenpaging feature when they access the same page.

b) In create_grant_pte_mapping() of mm.c from /xen/arch/x86, it
uses gmfn_to_mfn() to get mfn, and then executes map_domain_page(mfn). At
the same time, the page is paged_out and the mfn is changed to INVALID_MFN.
So that in create_grant_pte_mapping () when it goes to mfn_to_page(mfn), it
will make a mistake.Because xen didn''t judge the mfn and thought the
mfn was
original.
I mean there may be some conflicts of operations after getting the mfn in
xen but the page is paged_out in the meantime.

2011/9/30 Olaf Hering <olaf@aepfle.de>
> On Thu, Sep 29, zhen shi wrote:
>
> >  Hi,Olaf,
> >
> >  When we analyze and test xenpaging,we found there are some
> problems between
> > mapping and xenpaging.
> >  1) When mapping firstly, then do xenpaging,and the code paths have
> resolved
> > the problems.It''s OK.
> >  2) The problems exists if we do address mapping firstly then go to
> > xenpaging,and our confusions are as followings:
> >    a) If the domU''s memory is directly mapped to dom0,such as
the
> hypercall
> > from pv driver,then it will build a related page-table in dom0,which
will
> not
> > change p2m-type.
> >       and then do the xenpaging to page out the domU''s memory
pages whose
> gfn
> > address have been already mapped to dom0;So it will cause some
problems
> when
> > dom0
> >       accesses these pages.Because these pages are paged-out,and dom0
> cannot
> > tell the p2mt before access the pages.
>
> I''m not entirely sure what you do. xenpaging runs in dom0 and is
able to
> map paged-out pages. It uses that to trigger a page-in, see
> tools/xenpaging/pagein.c in xen-unstable.hg
>
> >   b)The another situation is that if xen has mapped the
domU''s page, and
> get
> > the mfn according to pfn_to_mfn.But then the page''s p2mt is
changed by
> others,
> > so when xen
> >     accesses the page ,it will cause problems such as BSOD or
> reboot.Because
> > the operations of getting mfn and accessing the page are not
> > atomic.and the situation exists
> >     in many code paths .
>
> Can you be more specific what you mean? Xen doesnt seem to have a
> pfn_to_mfn function, only the tools have some helper macros of that name.
>
>
> Olaf
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Oct-03 14:56 UTC

head link

Re: [Xen-devel] Re: mapping problems in xenpaging

On Fri, Sep 30, Adin Scannell wrote:
> >>  When we analyze and test xenpaging,we found there are some
problems between
> >> mapping and xenpaging.
> >>  1) When mapping firstly, then do xenpaging,and the code paths
have resolved
> >> the problems.It''s OK.
> >>  2) The problems exists if we do address mapping firstly then go
to
> >> xenpaging,and our confusions are as followings:
> >>    a) If the domU''s memory is directly mapped to
dom0,such as the hypercall
> >> from pv driver,then it will build a related page-table in
dom0,which will not
> >> change p2m-type.
> >>       and then do the xenpaging to page out the domU''s
memory pages whose gfn
> >> address have been already mapped to dom0;So it will cause some
problems when
> >> dom0
> >>       accesses these pages.Because these pages are paged-out,and
dom0 cannot
> >> tell the p2mt before access the pages.
> >
> > I''m not entirely sure what you do. xenpaging runs in dom0 and
is able to
> > map paged-out pages. It uses that to trigger a page-in, see
> > tools/xenpaging/pagein.c in xen-unstable.hg
> 
> Here''s my take...
> 
> Xenpaging doesn''t allow selection of pages that have been mapped
by
> other domains (as in p2m.c):
> 
>  669 int p2m_mem_paging_nominate(struct domain *d, unsigned long gfn)
> ....
>  693     /* Check page count and type */
>  694     page = mfn_to_page(mfn);
>  695     if ( (page->count_info & (PGC_count_mask | PGC_allocated))
!>  696          (1 | PGC_allocated) )
>  697         goto out;
> 
> *However*, I think that the problem Zhen is describing still exists:
> 1) xenpaging nominates a page, it is successful.
> 2) dom0 maps the same page (a process other than xenpaging, which will
> also map it).
> 3) xenpaging evicts the page, successfully.
> 
> I''ve experienced a few nasty crashes, and I think this could
account
> for a couple (but certainly not all)... I think that the solution may
> be to repeat the refcount check in paging_evict, and roll back the
> nomination gracefully if the race is detected. Thoughts?
Are there really code paths that touch a mfn without going through the
p2m functions? If so I will copy the check and update xenpaging.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2011-Oct-06 11:10 UTC

head link

Re: [Xen-devel] Re: mapping problems in xenpaging

At 16:56 +0200 on 03 Oct (1317660976), Olaf Hering
wrote:> On Fri, Sep 30, Adin Scannell wrote:
> 
> > >>  When we analyze and test xenpaging,we found there are some
problems between
> > >> mapping and xenpaging.
> > >>  1) When mapping firstly, then do xenpaging,and the code
paths have resolved
> > >> the problems.It''s OK.
> > >>  2) The problems exists if we do address mapping firstly then
go to
> > >> xenpaging,and our confusions are as followings:
> > >>    a) If the domU''s memory is directly mapped to
dom0,such as the hypercall
> > >> from pv driver,then it will build a related page-table in
dom0,which will not
> > >> change p2m-type.
> > >>       and then do the xenpaging to page out the
domU''s memory pages whose gfn
> > >> address have been already mapped to dom0;So it will cause
some problems when
> > >> dom0
> > >>       accesses these pages.Because these pages are
paged-out,and dom0 cannot
> > >> tell the p2mt before access the pages.
> > >
> > > I''m not entirely sure what you do. xenpaging runs in
dom0 and is able to
> > > map paged-out pages. It uses that to trigger a page-in, see
> > > tools/xenpaging/pagein.c in xen-unstable.hg
> > 
> > Here''s my take...
> > 
> > Xenpaging doesn''t allow selection of pages that have been
mapped by
> > other domains (as in p2m.c):
> > 
> >  669 int p2m_mem_paging_nominate(struct domain *d, unsigned long gfn)
> > ....
> >  693     /* Check page count and type */
> >  694     page = mfn_to_page(mfn);
> >  695     if ( (page->count_info & (PGC_count_mask |
PGC_allocated)) !> >  696          (1 | PGC_allocated) )
> >  697         goto out;
> > 
> > *However*, I think that the problem Zhen is describing still exists:
> > 1) xenpaging nominates a page, it is successful.
> > 2) dom0 maps the same page (a process other than xenpaging, which will
> > also map it).
> > 3) xenpaging evicts the page, successfully.
> > 
> > I''ve experienced a few nasty crashes, and I think this could
account
> > for a couple (but certainly not all)... I think that the solution may
> > be to repeat the refcount check in paging_evict, and roll back the
> > nomination gracefully if the race is detected. Thoughts?
> 
> Are there really code paths that touch a mfn without going through the
> p2m functions? If so I will copy the check and update xenpaging.
No, but there are race conditions where CPU A could to the p2m lookup,
then CPU B nominates the page and changes its p2m entry, then CPU A
completes the mapping.  In the extreme case, detecting this in the
eviction code is also subject to the same race; some sort of atomic
lookup-and-get-reference operation seems like a better fix.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

zhen shi

2011-Oct-09 16:40 UTC

head link

Re: [Xen-devel] Re: mapping problems in xenpaging

2011/10/6 Tim Deegan <tim@xen.org>
> At 16:56 +0200 on 03 Oct (1317660976), Olaf Hering wrote:
> > On Fri, Sep 30, Adin Scannell wrote:
> >
> > > >>  When we analyze and test xenpaging,we found there are
some
> problems between
> > > >> mapping and xenpaging.
> > > >>  1) When mapping firstly, then do xenpaging,and the code
paths have
> resolved
> > > >> the problems.It''s OK.
> > > >>  2) The problems exists if we do address mapping firstly
then go to
> > > >> xenpaging,and our confusions are as followings:
> > > >>    a) If the domU''s memory is directly mapped
to dom0,such as the
> hypercall
> > > >> from pv driver,then it will build a related page-table
in dom0,which
> will not
> > > >> change p2m-type.
> > > >>       and then do the xenpaging to page out the
domU''s memory pages
> whose gfn
> > > >> address have been already mapped to dom0;So it will
cause some
> problems when
> > > >> dom0
> > > >>       accesses these pages.Because these pages are
paged-out,and
> dom0 cannot
> > > >> tell the p2mt before access the pages.
> > > >
> > > > I''m not entirely sure what you do. xenpaging runs
in dom0 and is able
> to
> > > > map paged-out pages. It uses that to trigger a page-in, see
> > > > tools/xenpaging/pagein.c in xen-unstable.hg
> > >
> > > Here''s my take...
> > >
> > > Xenpaging doesn''t allow selection of pages that have
been mapped by
> > > other domains (as in p2m.c):
> > >
> > >  669 int p2m_mem_paging_nominate(struct domain *d, unsigned long
gfn)
> > > ....
> > >  693     /* Check page count and type */
> > >  694     page = mfn_to_page(mfn);
> > >  695     if ( (page->count_info & (PGC_count_mask |
PGC_allocated)) !> > >  696          (1 | PGC_allocated) )
> > >  697         goto out;
>
>     I wonder if pages have been mapped by other domains,then thepage->count_info will be added.I have analyzed  xc_map_foreign_pages()
function,and have not figured out how to add the page->count_info
by xc_map_foreign_pages().and the page->count_info decreases in munmap().

> > > *However*, I think that the problem Zhen is describing still
exists:
> > > 1) xenpaging nominates a page, it is successful.
> > > 2) dom0 maps the same page (a process other than xenpaging, which
will
> > > also map it).
> > > 3) xenpaging evicts the page, successfully.
> > >
> > > I''ve experienced a few nasty crashes, and I think this
could account
> > > for a couple (but certainly not all)... I think that the solution
may
> > > be to repeat the refcount check in paging_evict, and roll back
the
> > > nomination gracefully if the race is detected. Thoughts?
>
> > Are there really code paths that touch a mfn without going through the
> > p2m functions? If so I will copy the check and update xenpaging.
>
> >No, but there are race conditions where CPU A could to the p2m lookup,
> >then CPU B nominates the page and changes its p2m entry, then CPU A
> >completes the mapping.  In the extreme case, detecting this in the
> >eviction code is also subject to the same race; some sort of atomic
> >lookup-and-get-reference operation seems like a better fix.
>
  Tim , Olaf and Adin, do you have any good ideas about the above
situation.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar Cavilla

2011-Oct-10 01:20 UTC

head link

[Xen-devel] Re: Re: mapping problems in xenpaging

I have a proposal. I''d like to hear from the list what they think.

- 1. change p2m lock to a read/write lock
- 2. Make lookups (gfn_to_mfn_* family) take a read lock. All current
callers of p2m_lock will become write lockers.
- 3. Change the gfn_to_mfn_* family to get_page on the mfn obtained,
while holding the read lock.
- 4. Have all lookup callers put_page on the obtained mfn, once done.

Rationale: rwlock will prevent races between lookups and async p2m
modifications by paging, sharing, or feature X. The lookup routine
will be protected from races and able to atomically get_page on the
obtained mfn. The lookup caller will be able to work on this mfn
knowing it won''t disappear underneath (as in the case currently
brought forward by Zhen)

I''m somewhat wary of having all callers required to put_page, but I
don''t think it''s a big deal because it''s perfectly
reasonable.

I''m more wary that turning p2m locking into read/write will result in
code deadlocking itself: taking a read lock first and a write lock
later. Possibly the current rwlock implementation could be improved to
keep a cpumask of read-lockers, and provide an atomic "promote from
read to write" atomic operation (something along the lines of wait
until you''re the only reader in the cpumask, and then cmpxchg(lock,
-1, WRITE_BIAS))

Hope that made sense. Thoughts?

Andres
> Date: Mon, 10 Oct 2011 00:40:32 +0800
> From: zhen shi <bickys1986@gmail.com>
> Subject: Re: [Xen-devel] Re: mapping problems in xenpaging
> To: Tim Deegan <tim@xen.org>, Olaf Hering <olaf@aepfle.de>,    
Adin
>        Scannell <adin@gridcentric.ca>
> Cc: xen-devel@lists.xensource.com
> Message-ID:
>      
 <CACavRyA+Djzr3AVwgaZQu1-doPiMkAZ-NpdVR1nXjiiW_74PqQ@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> 2011/10/6 Tim Deegan <tim@xen.org>
>
>> At 16:56 +0200 on 03 Oct (1317660976), Olaf Hering wrote:
>> > On Fri, Sep 30, Adin Scannell wrote:
>> >
>> > > >>  When we analyze and test xenpaging,we found there
are some
>> problems between
>> > > >> mapping and xenpaging.
>> > > >>  1) When mapping firstly, then do xenpaging,and the
code paths have
>> resolved
>> > > >> the problems.It''s OK.
>> > > >>  2) The problems exists if we do address mapping
firstly then go to
>> > > >> xenpaging,and our confusions are as followings:
>> > > >>    a) If the domU''s memory is directly
mapped to dom0,such as the
>> hypercall
>> > > >> from pv driver,then it will build a related
page-table in dom0,which
>> will not
>> > > >> change p2m-type.
>> > > >>       and then do the xenpaging to page out the
domU''s memory pages
>> whose gfn
>> > > >> address have been already mapped to dom0;So it will
cause some
>> problems when
>> > > >> dom0
>> > > >>       accesses these pages.Because these pages are
paged-out,and
>> dom0 cannot
>> > > >> tell the p2mt before access the pages.
>> > > >
>> > > > I''m not entirely sure what you do. xenpaging
runs in dom0 and is able
>> to
>> > > > map paged-out pages. It uses that to trigger a page-in,
see
>> > > > tools/xenpaging/pagein.c in xen-unstable.hg
>> > >
>> > > Here''s my take...
>> > >
>> > > Xenpaging doesn''t allow selection of pages that have
been mapped by
>> > > other domains (as in p2m.c):
>> > >
>> > >  669 int p2m_mem_paging_nominate(struct domain *d, unsigned
long gfn)
>> > > ....
>> > >  693     /* Check page count and type */
>> > >  694     page = mfn_to_page(mfn);
>> > >  695     if ( (page->count_info & (PGC_count_mask |
PGC_allocated)) !>> > >  696          (1 | PGC_allocated) )
>> > >  697         goto out;
>>
>>     I wonder if pages have been mapped by other domains,then the
> page->count_info will be added.I have analyzed  xc_map_foreign_pages()
> function,and have not figured out how to add the page->count_info
> by xc_map_foreign_pages().and the page->count_info decreases in
munmap().
>
>
>> > > *However*, I think that the problem Zhen is describing still
exists:
>> > > 1) xenpaging nominates a page, it is successful.
>> > > 2) dom0 maps the same page (a process other than xenpaging,
which will
>> > > also map it).
>> > > 3) xenpaging evicts the page, successfully.
>> > >
>> > > I''ve experienced a few nasty crashes, and I think
this could account
>> > > for a couple (but certainly not all)... I think that the
solution may
>> > > be to repeat the refcount check in paging_evict, and roll
back the
>> > > nomination gracefully if the race is detected. Thoughts?
>>
>
>
>> > Are there really code paths that touch a mfn without going through
the
>> > p2m functions? If so I will copy the check and update xenpaging.
>>
>> >No, but there are race conditions where CPU A could to the p2m
lookup,
>> >then CPU B nominates the page and changes its p2m entry, then CPU A
>> >completes the mapping.  In the extreme case, detecting this in the
>> >eviction code is also subject to the same race; some sort of atomic
>> >lookup-and-get-reference operation seems like a better fix.
>>
>
>  Tim , Olaf and Adin, do you have any good ideas about the above
> situation.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
http://lists.xensource.com/archives/html/xen-devel/attachments/20111010/55486330/attachment.html
>
> ------------------------------
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
> End of Xen-devel Digest, Vol 80, Issue 104
> ******************************************
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2011-Oct-10 09:21 UTC

head link

[Xen-devel] Re: Re: mapping problems in xenpaging

At 21:20 -0400 on 09 Oct (1318195224), Andres Lagar Cavilla
wrote:> I have a proposal. I''d like to hear from the list what they think.
> 
> - 1. change p2m lock to a read/write lock
> - 2. Make lookups (gfn_to_mfn_* family) take a read lock. All current
> callers of p2m_lock will become write lockers.
> - 3. Change the gfn_to_mfn_* family to get_page on the mfn obtained,
> while holding the read lock.
> - 4. Have all lookup callers put_page on the obtained mfn, once done.
This seems like a step in the right direction, but if we''re going to
make this big an interface change there might be better interfaces to
end up with.

A few issues I can see with it:
 - p2m lookups are on some very performance-sensitive paths
   (e.g. multiple times in any pagetable walk or instruction emulation
   in a HVM guest) so adding the rwlock might have a noticeable impact.

 - This fixes one class of races (page gets freed-to-xen underfoot) but
   leaves another one (gfn -> mfn map changes underfoot) untouched.  In
   particular it doesn''t solve the race where a foreign mapper 
   gets a r/w map of what''s meant to be a read-only frame.

I think that to fix things properly we need to have some refcount
associated with the p2m mapping itself.  That would be taken by all
lookups (or at least some - we could have a flag to the p2m lookup) and
released as you suggest, but more importantly it would block all p2m changes
while the count was raised.  (I think that a least in the common case we
could encode such a refcount using the existing typecount).

One problem then is how to make all the callers of the p2m update
functions handle failure, either by waiting (new deadlock risk?) or
returning EAGAIN at the hypercall interface.  Paths where the update
isn''t caused by an explicit request (like log-dirty and the mem-event
rx-to-rw conversion) would be particularly tricky.  

More seriously, it introduces another round of the sort of priority
inversion we already get with the existing refcounts - a foreign
mapping, caused by a user-space program in another VM, could arbitrarily
delay a p2m update (and so prevent a VCPU from making progress), without
any mechanism to even request that the mapping be removed.

Any ideas how to avoid that?  Potentially with some extra bookkeeping on
foreign mappings we could revoke or redirect them when the p2m changes.
That would fit nicely with the abstraction in the interfaces where HVM
domains'' memory is always indexed by pfn.  I can imagine it being quite
tricky though.
> I''m more wary that turning p2m locking into read/write will result
in
> code deadlocking itself: taking a read lock first and a write lock
> later. Possibly the current rwlock implementation could be improved to
> keep a cpumask of read-lockers, and provide an atomic "promote from
> read to write" atomic operation (something along the lines of wait
> until you''re the only reader in the cpumask, and then
cmpxchg(lock,
> -1, WRITE_BIAS))
I think that would deadlock if two cpus tried it at once.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Oct-10 10:06 UTC

head link

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

On 10/10/2011 10:21, "Tim Deegan" <tim@xen.org> wrote:
> At 21:20 -0400 on 09 Oct (1318195224), Andres Lagar Cavilla wrote:
>> I have a proposal. I''d like to hear from the list what they
think.
>> 
>> - 1. change p2m lock to a read/write lock
>> - 2. Make lookups (gfn_to_mfn_* family) take a read lock. All current
>> callers of p2m_lock will become write lockers.
>> - 3. Change the gfn_to_mfn_* family to get_page on the mfn obtained,
>> while holding the read lock.
>> - 4. Have all lookup callers put_page on the obtained mfn, once done.
> 
> This seems like a step in the right direction, but if we''re going
to
> make this big an interface change there might be better interfaces to
> end up with.
> 
> A few issues I can see with it:
>  - p2m lookups are on some very performance-sensitive paths
>    (e.g. multiple times in any pagetable walk or instruction emulation
>    in a HVM guest) so adding the rwlock might have a noticeable impact.
If the read sections are short, may as well use a plain spinlock.

The best (but hard) way to make the locking cheaper is to work out a way to
use finer-grained locks (e.g., per-page / per-mapping) or avoid locks
altogether (e.g., RCU).

Multi-reader locks are rarely going to be a good choice in the hypervisor.

A good first step anyhow would be to make the p2m_ synchronisation correct,
and then optimise it. Sounds like that is hard enough. :-)

 -- Keir
>  - This fixes one class of races (page gets freed-to-xen underfoot) but
>    leaves another one (gfn -> mfn map changes underfoot) untouched.  In
>    particular it doesn''t solve the race where a foreign mapper
>    gets a r/w map of what''s meant to be a read-only frame.
> 
> I think that to fix things properly we need to have some refcount
> associated with the p2m mapping itself.  That would be taken by all
> lookups (or at least some - we could have a flag to the p2m lookup) and
> released as you suggest, but more importantly it would block all p2m
changes
> while the count was raised.  (I think that a least in the common case we
> could encode such a refcount using the existing typecount).
> 
> One problem then is how to make all the callers of the p2m update
> functions handle failure, either by waiting (new deadlock risk?) or
> returning EAGAIN at the hypercall interface.  Paths where the update
> isn''t caused by an explicit request (like log-dirty and the
mem-event
> rx-to-rw conversion) would be particularly tricky.
> 
> More seriously, it introduces another round of the sort of priority
> inversion we already get with the existing refcounts - a foreign
> mapping, caused by a user-space program in another VM, could arbitrarily
> delay a p2m update (and so prevent a VCPU from making progress), without
> any mechanism to even request that the mapping be removed.
> 
> Any ideas how to avoid that?  Potentially with some extra bookkeeping on
> foreign mappings we could revoke or redirect them when the p2m changes.
> That would fit nicely with the abstraction in the interfaces where HVM
> domains'' memory is always indexed by pfn.  I can imagine it being
quite
> tricky though.
> 
>> I''m more wary that turning p2m locking into read/write will
result in
>> code deadlocking itself: taking a read lock first and a write lock
>> later. Possibly the current rwlock implementation could be improved to
>> keep a cpumask of read-lockers, and provide an atomic "promote
from
>> read to write" atomic operation (something along the lines of wait
>> until you''re the only reader in the cpumask, and then
cmpxchg(lock,
>> -1, WRITE_BIAS))
> 
> I think that would deadlock if two cpus tried it at once.
> 
> Tim.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar Cavilla

2011-Oct-10 19:17 UTC

head link

[Xen-devel] Re: Re: mapping problems in xenpaging

Hi Tim,

On Mon, Oct 10, 2011 at 5:21 AM, Tim Deegan <tim@xen.org>
wrote:> At 21:20 -0400 on 09 Oct (1318195224), Andres Lagar Cavilla wrote:
>> I have a proposal. I''d like to hear from the list what they
think.
>>
>> - 1. change p2m lock to a read/write lock
>> - 2. Make lookups (gfn_to_mfn_* family) take a read lock. All current
>> callers of p2m_lock will become write lockers.
>> - 3. Change the gfn_to_mfn_* family to get_page on the mfn obtained,
>> while holding the read lock.
>> - 4. Have all lookup callers put_page on the obtained mfn, once done.
>
> This seems like a step in the right direction, but if we''re going
to
> make this big an interface change there might be better interfaces to
> end up with.Agree, and hopefully we can converge towards something awesome.
> A few issues I can see with it:
>  - p2m lookups are on some very performance-sensitive paths
>   (e.g. multiple times in any pagetable walk or instruction emulation
>   in a HVM guest) so adding the rwlock might have a noticeable impact.
>
>  - This fixes one class of races (page gets freed-to-xen underfoot) but
>   leaves another one (gfn -> mfn map changes underfoot) untouched.  In
>   particular it doesn''t solve the race where a foreign mapper
>   gets a r/w map of what''s meant to be a read-only frame.
Can you elaborate a bit? Under what situations does the gfn->mfn map
change underfoot (other than sharing, paging in or superpage
sharding)? Wouldn''t those two be taking a writer lock, and thus be
mutually ecxluded from lookups?

Also, the second problem you mention (foregin rw map of ro page) seems
to be a tad different. That fix should go into get_page_from_l1e,
right? Isn''t qemu allowed to do this all the time?
> I think that to fix things properly we need to have some refcount
> associated with the p2m mapping itself.  That would be taken by all
> lookups (or at least some - we could have a flag to the p2m lookup) and
> released as you suggest, but more importantly it would block all p2m
changes
> while the count was raised.  (I think that a least in the common case we
> could encode such a refcount using the existing typecount).
Assuming mapping means "entry in p2m", multiple mappings would have
their ref count collapse in the page typecount. Isn''t that a problem?
Do we need per-mapping refcounts, or rather, per mapping mutual
exclusion? My feel is that page refcounts are necessary to prevent the
page from disappearing, and mappings need to have their lookups and
modifications synchronized.
> One problem then is how to make all the callers of the p2m update
> functions handle failure, either by waiting (new deadlock risk?) or
> returning EAGAIN at the hypercall interface.  Paths where the update
> isn''t caused by an explicit request (like log-dirty and the
mem-event
> rx-to-rw conversion) would be particularly tricky.
Callers already wait on lock_p2m. They''ll wait longer :)

On failure, to cite a specific example, if paging was trying to swap
something out that got foreign-mapped by somebody else, then yeah,
there''s no other option than failing that call.

How would log-dirty and x^w fail (if the refcount increases before
they get exclusive access to the mapping)? They''re not trying to
change the mapping and/or make a page go away, rather they''re changing
the p2m permission.
> More seriously, it introduces another round of the sort of priority
> inversion we already get with the existing refcounts - a foreign
> mapping, caused by a user-space program in another VM, could arbitrarily
> delay a p2m update (and so prevent a VCPU from making progress), without
> any mechanism to even request that the mapping be removed.
Yeah, that''s tricky. I do not know if there is a fix at all.
Fundamentally, the foreign mapper (some dom0 sharing/paging/foo
utility) is completely async to the domain. Even if we were to revoke
the foreign mapping as you suggest below, that would involve an upcall
into the foreign-mapper-guest-OS to have it cleanly neuter the mapping
and drop the refcount. That''s not at all trivial! Perhaps foreign
mapping vma''s should be taught to patiently re-establish mappings if
they disappear under their feet? Event then, you need to keep track of
those foreign l1e''s, and nothing short of a list will do.

Because this is performance rather than correctness I''m inclined to
not poke the beast.
> Any ideas how to avoid that?  Potentially with some extra bookkeeping on
> foreign mappings we could revoke or redirect them when the p2m changes.
> That would fit nicely with the abstraction in the interfaces where HVM
> domains'' memory is always indexed by pfn.  I can imagine it being
quite
> tricky though.
>
>> I''m more wary that turning p2m locking into read/write will
result in
>> code deadlocking itself: taking a read lock first and a write lock
>> later. Possibly the current rwlock implementation could be improved to
>> keep a cpumask of read-lockers, and provide an atomic "promote
from
>> read to write" atomic operation (something along the lines of wait
>> until you''re the only reader in the cpumask, and then
cmpxchg(lock,
>> -1, WRITE_BIAS))
>
> I think that would deadlock if two cpus tried it at once.
If you keep a cpumask of all read lockers, and only try to promote if
you''re the only read locker, it wouldn''t. But then
you''d have to
protect the cpumask from races with another spinlock. Yuk.

Which brings me to another question: p2m_locked_by_me (and others)
check on the physical cpu (lock->holder == curent->processor,
roughly). Well, what if you lock and then the vcpu migrates? Are vcpu
migrations prevented if you hold any locks? Or is there some other
magic going on?

Thanks!
Andres>
> Tim.
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar Cavilla

2011-Oct-10 19:31 UTC

head link

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

Keir,

On Mon, Oct 10, 2011 at 6:06 AM, Keir Fraser <keir.xen@gmail.com>
wrote:> On 10/10/2011 10:21, "Tim Deegan" <tim@xen.org> wrote:
>
>> At 21:20 -0400 on 09 Oct (1318195224), Andres Lagar Cavilla wrote:
>>> I have a proposal. I''d like to hear from the list what
they think.
>>>
>>> - 1. change p2m lock to a read/write lock
>>> - 2. Make lookups (gfn_to_mfn_* family) take a read lock. All
current
>>> callers of p2m_lock will become write lockers.
>>> - 3. Change the gfn_to_mfn_* family to get_page on the mfn
obtained,
>>> while holding the read lock.
>>> - 4. Have all lookup callers put_page on the obtained mfn, once
done.
>>
>> This seems like a step in the right direction, but if we''re
going to
>> make this big an interface change there might be better interfaces to
>> end up with.
>>
>> A few issues I can see with it:
>>  - p2m lookups are on some very performance-sensitive paths
>>    (e.g. multiple times in any pagetable walk or instruction emulation
>>    in a HVM guest) so adding the rwlock might have a noticeable impact.
>
> If the read sections are short, may as well use a plain spinlock.
>
> The best (but hard) way to make the locking cheaper is to work out a way to
> use finer-grained locks (e.g., per-page / per-mapping) or avoid locks
> altogether (e.g., RCU).No clue about RCU. But the p2m tree structure lends itself naturally
to fine-grained locking. In fact, hierarchical locking given 2M and 1G
superpages.

Now, this moves all the locking into the specific p2m implementations,
ept and traditional pt. Do you think a test_and_set-style spinlock
could fit in the unused bits of a p2m entry. It would have scarce
debug information :) I don''t know if ept would freak out with someone
spinning on an entry it has loaded in the translation hardware.
Probably.

So, perhaps the most decent idea is to have a tree/array of locks on
the side. This would not have to live inside the ept/pt
implementation-specific layer. Although locking unaligned,
arbitrarily-sized ranges of pages (Does anyone do that? PoD?) would
become a big headache.
>
> Multi-reader locks are rarely going to be a good choice in the hypervisor.
>
> A good first step anyhow would be to make the p2m_ synchronisation correct,
> and then optimise it. Sounds like that is hard enough. :-)
Pigybacking another question: ultimately, if we get p2m sync correct,
paging can introduce arbitrary waits. Currently the code bails out,
rather ungracefully (e.g. hvm_copy). Is this what wait queues were
introduced for? Hasn''t that been implemented purely out of lack of
cycles, or something more fundamental awaits?

Thanks!
Andres>
>  -- Keir
>
>>  - This fixes one class of races (page gets freed-to-xen underfoot) but
>>    leaves another one (gfn -> mfn map changes underfoot) untouched.
 In
>>    particular it doesn''t solve the race where a foreign mapper
>>    gets a r/w map of what''s meant to be a read-only frame.
>>
>> I think that to fix things properly we need to have some refcount
>> associated with the p2m mapping itself.  That would be taken by all
>> lookups (or at least some - we could have a flag to the p2m lookup) and
>> released as you suggest, but more importantly it would block all p2m
changes
>> while the count was raised.  (I think that a least in the common case
we
>> could encode such a refcount using the existing typecount).
>>
>> One problem then is how to make all the callers of the p2m update
>> functions handle failure, either by waiting (new deadlock risk?) or
>> returning EAGAIN at the hypercall interface.  Paths where the update
>> isn''t caused by an explicit request (like log-dirty and the
mem-event
>> rx-to-rw conversion) would be particularly tricky.
>>
>> More seriously, it introduces another round of the sort of priority
>> inversion we already get with the existing refcounts - a foreign
>> mapping, caused by a user-space program in another VM, could
arbitrarily
>> delay a p2m update (and so prevent a VCPU from making progress),
without
>> any mechanism to even request that the mapping be removed.
>>
>> Any ideas how to avoid that?  Potentially with some extra bookkeeping
on
>> foreign mappings we could revoke or redirect them when the p2m changes.
>> That would fit nicely with the abstraction in the interfaces where HVM
>> domains'' memory is always indexed by pfn.  I can imagine it
being quite
>> tricky though.
>>
>>> I''m more wary that turning p2m locking into read/write
will result in
>>> code deadlocking itself: taking a read lock first and a write lock
>>> later. Possibly the current rwlock implementation could be improved
to
>>> keep a cpumask of read-lockers, and provide an atomic "promote
from
>>> read to write" atomic operation (something along the lines of
wait
>>> until you''re the only reader in the cpumask, and then
cmpxchg(lock,
>>> -1, WRITE_BIAS))
>>
>> I think that would deadlock if two cpus tried it at once.
>>
>> Tim.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Oct-11 06:04 UTC

head link

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

On Mon, Oct 10, Andres Lagar Cavilla wrote:
> Pigybacking another question: ultimately, if we get p2m sync correct,
> paging can introduce arbitrary waits. Currently the code bails out,
> rather ungracefully (e.g. hvm_copy). Is this what wait queues were
> introduced for? Hasn''t that been implemented purely out of lack of
> cycles, or something more fundamental awaits?
I did not have the time yet to make use of that new feature.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2011-Oct-13 15:02 UTC

head link

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

Hi,

At 15:17 -0400 on 10 Oct (1318259867), Andres Lagar Cavilla
wrote:> On Mon, Oct 10, 2011 at 5:21 AM, Tim Deegan <tim@xen.org> wrote:
> > A few issues I can see with it:
> >  - p2m lookups are on some very performance-sensitive paths
> >   (e.g. multiple times in any pagetable walk or instruction emulation
> >   in a HVM guest) so adding the rwlock might have a noticeable impact.
> >
> >  - This fixes one class of races (page gets freed-to-xen underfoot)
but
> >   leaves another one (gfn -> mfn map changes underfoot) untouched.
 In
> >   particular it doesn''t solve the race where a foreign mapper
> >   gets a r/w map of what''s meant to be a read-only frame.
> 
> Can you elaborate a bit? Under what situations does the gfn->mfn map
> change underfoot (other than sharing, paging in or superpage
> sharding)? Wouldn''t those two be taking a writer lock, and thus be
> mutually ecxluded from lookups?
Once CPU A has done a p2m lookup but before it is finished using the
result (so, maybe it has established a mapping, or is about to establish
a mapping, or is going to do some other thing like shadow a pagetable):
 - CPU B might run a log-dirty clean that makes the pfn read-only. 
   This is actually handled, for historical reasons, by having all 
   possible callers be aware of log-dirty - that son''t scale to 
   all other p2m actions, though. :(
 - CPU B might remove the p2m mapping entirely and free the page. 
   This is the case that your suggestion handles, by holding 
   a refcount on the old MFN.
 - CPU B might make the page read-only so it can be shared with another
   VM.
 - CPU B might move the MFN to another PFN (this happens with emulated 
   video RAM, I believe).

> > I think that to fix things properly we need to have some refcount
> > associated with the p2m mapping itself.  That would be taken by all
> > lookups (or at least some - we could have a flag to the p2m lookup)
and
> > released as you suggest, but more importantly it would block all p2m
changes
> > while the count was raised.  (I think that a least in the common case
we
> > could encode such a refcount using the existing typecount).
> 
> Assuming mapping means "entry in p2m", multiple mappings would
have
> their ref count collapse in the page typecount. Isn''t that a
problem?
It might be.  I think the only case where multiple p2m mappings point to
the same MFN is page-sharing, which is already a special case; but it
might make sense to have the refcounts per-pfn anyway, just for
clarity.  
> Do we need per-mapping refcounts, or rather, per mapping mutual
> exclusion? My feel is that page refcounts are necessary to prevent the
> page from disappearing, and mappings need to have their lookups and
> modifications synchronized.
> 
> > One problem then is how to make all the callers of the p2m update
> > functions handle failure, either by waiting (new deadlock risk?) or
> > returning EAGAIN at the hypercall interface.  Paths where the update
> > isn''t caused by an explicit request (like log-dirty and the
mem-event
> > rx-to-rw conversion) would be particularly tricky.
> 
> Callers already wait on lock_p2m. They''ll wait longer :)
:) 
> On failure, to cite a specific example, if paging was trying to swap
> something out that got foreign-mapped by somebody else, then yeah,
> there''s no other option than failing that call.
> 
> How would log-dirty and x^w fail (if the refcount increases before
> they get exclusive access to the mapping)? They''re not trying to
> change the mapping and/or make a page go away, rather they''re
changing
> the p2m permission.
Good point.  But x^w would have to fail (or at least wait) if
x86_emulate on another CPU was doing an instruction-fetch.  Otherwise
there''s a race where the page becomes writeable and is written to
before
the instruction fetch completes.
> > More seriously, it introduces another round of the sort of priority
> > inversion we already get with the existing refcounts - a foreign
> > mapping, caused by a user-space program in another VM, could
arbitrarily
> > delay a p2m update (and so prevent a VCPU from making progress),
without
> > any mechanism to even request that the mapping be removed.
> 
> Yeah, that''s tricky. I do not know if there is a fix at all.
> Fundamentally, the foreign mapper (some dom0 sharing/paging/foo
> utility) is completely async to the domain. Even if we were to revoke
> the foreign mapping as you suggest below, that would involve an upcall
> into the foreign-mapper-guest-OS to have it cleanly neuter the mapping
> and drop the refcount. That''s not at all trivial! Perhaps foreign
> mapping vma''s should be taught to patiently re-establish mappings
if
> they disappear under their feet? Event then, you need to keep track of
> those foreign l1e''s, and nothing short of a list will do.
> 
> Because this is performance rather than correctness I''m inclined
to
> not poke the beast.
I''m inclined to agree.  Maybe the right thing to do is implement it and
see whether serious problems arise. 
> >> I''m more wary that turning p2m locking into read/write
will result in
> >> code deadlocking itself: taking a read lock first and a write lock
> >> later. Possibly the current rwlock implementation could be
improved to
> >> keep a cpumask of read-lockers, and provide an atomic
"promote from
> >> read to write" atomic operation (something along the lines of
wait
> >> until you''re the only reader in the cpumask, and then
cmpxchg(lock,
> >> -1, WRITE_BIAS))
> >
> > I think that would deadlock if two cpus tried it at once.
> 
> If you keep a cpumask of all read lockers, and only try to promote if
> you''re the only read locker, it wouldn''t. But then
you''d have to
> protect the cpumask from races with another spinlock. Yuk.
If two CPUs both take read locks and both want to promote, they''ll
deadlock waiting for each other to go away.  
> Which brings me to another question: p2m_locked_by_me (and others)
> check on the physical cpu (lock->holder == curent->processor,
> roughly). Well, what if you lock and then the vcpu migrates? Are vcpu
> migrations prevented if you hold any locks? Or is there some other
> magic going on?
When that code was written, vcpus could not be preempted in the
hypervisor (except in the scheduler softirq handler).  Now they can
voluntarily preempt, but not while holding any locks, so it''s still OK.

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2011-Oct-13 15:08 UTC

head link

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

Hi, 

At 15:31 -0400 on 10 Oct (1318260666), Andres Lagar Cavilla
wrote:> On Mon, Oct 10, 2011 at 6:06 AM, Keir Fraser <keir.xen@gmail.com>
wrote:
> > On 10/10/2011 10:21, "Tim Deegan" <tim@xen.org> wrote:
> > The best (but hard) way to make the locking cheaper is to work out a
way to
> > use finer-grained locks (e.g., per-page / per-mapping) or avoid locks
> > altogether (e.g., RCU).
> No clue about RCU. But the p2m tree structure lends itself naturally
> to fine-grained locking. In fact, hierarchical locking given 2M and 1G
> superpages.
> 
> Now, this moves all the locking into the specific p2m implementations,
> ept and traditional pt. Do you think a test_and_set-style spinlock
> could fit in the unused bits of a p2m entry. It would have scarce
> debug information :) I don''t know if ept would freak out with
someone
> spinning on an entry it has loaded in the translation hardware.
> Probably.
I think it would be OK on EPT and on 64-bit, where there are enough
available bits in a PTE.  32-bit PTEs are full, though.  It might clash
with the AMD IOMMU as well.  ISTR that they use a different set of avail
bits so when you''re using the same table for both NPT and IOMMU you
have
very few spare bits. 
> So, perhaps the most decent idea is to have a tree/array of locks on
> the side. This would not have to live inside the ept/pt
> implementation-specific layer. Although locking unaligned,
> arbitrarily-sized ranges of pages (Does anyone do that? PoD?) would
> become a big headache.
I don''t think anything does that, so having a tree of locks should work
fine (but might be a bit delicate to get right).  But as Keir says, we
can implement the refcounting of p2m entries first, with a single p2m
lock, and optimise afterwards -- I''m sure there will be some good way
of
reducing contention.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar Cavilla

2011-Oct-13 17:59 UTC

head link

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

Good stuff Tim, let me summarize:


- The key is to obtain exclusive access to a p2m entry, or range [gfn,
gfn + 1<<order). This exclusive access lasts beyond the actual lookup,
until the caller is finished with modifications, to prevent the p2m
mapping changing underfoot.

- bits for either fine-grain locks or refcounts need to be set aside.
Stuffing those bits in actual p2m entries will be very error prone/not
possible, given all existing implementations (NPT+IOMMU, 32bit, etc).
So, we''re stuck with extra space overhead for a fine-grained p2m
concurrency control structure.

- Unless the recount collapses into the page_info struct. Even then
there is a critical section "get p2m_entry then get_page" that needs
to execute atomically.

- foreign mappings can block p2m actions for arbitrarily long. This
doesn''t usually happen, but the risk is latent. This is "hard to
solve", for now.

question 1: I still don''t see the need for refcounts. If you want to
prevent changes underfoot, you need to lock the entry, and that''s it.
In all the cases you explained, somebody would have to wait until the
refcount on the entry drops to reflect they are the only holder. This
is akin to being locked out.

question 2: although internal hypervisor code paths do not seem to act
on unaligned p2m ranges, external calls (e.g. MEMF_populate_on_demand)
could possibly pass unaligned ranges. These complicate fine-grain
concurrency. Should we fail those? With so many toolstacks out there,
I feel very hesitant.

question 3: is there any way to know a priori the max gfn a domain
will have? Can we pre-allocate the concurrency control structure as
opposed to demand allocating it?

suggestion 1: bake exclusive access in the current calls. A p2m
lookup, followed by a p2m set_entry, delimit a critical section for
that range of p2m mappings. p2m lookups without closing set_entry will
have to issue a call to drop exclusive access on the range of
mappings.

suggestion 2: limit fine granularity (if locking, not refcounting), to
2MB superpages. Saves space. 512 neighbours can surely coexist without
locking each other out :)

Thanks,
Andres

On Thu, Oct 13, 2011 at 11:02 AM, Tim Deegan <tim@xen.org>
wrote:> Hi,
>
> At 15:17 -0400 on 10 Oct (1318259867), Andres Lagar Cavilla wrote:
>> On Mon, Oct 10, 2011 at 5:21 AM, Tim Deegan <tim@xen.org> wrote:
>> > A few issues I can see with it:
>> >  - p2m lookups are on some very performance-sensitive paths
>> >   (e.g. multiple times in any pagetable walk or instruction
emulation
>> >   in a HVM guest) so adding the rwlock might have a noticeable
impact.
>> >
>> >  - This fixes one class of races (page gets freed-to-xen
underfoot) but
>> >   leaves another one (gfn -> mfn map changes underfoot)
untouched.  In
>> >   particular it doesn''t solve the race where a foreign
mapper
>> >   gets a r/w map of what''s meant to be a read-only frame.
>>
>> Can you elaborate a bit? Under what situations does the gfn->mfn map
>> change underfoot (other than sharing, paging in or superpage
>> sharding)? Wouldn''t those two be taking a writer lock, and
thus be
>> mutually ecxluded from lookups?
>
> Once CPU A has done a p2m lookup but before it is finished using the
> result (so, maybe it has established a mapping, or is about to establish
> a mapping, or is going to do some other thing like shadow a pagetable):
>  - CPU B might run a log-dirty clean that makes the pfn read-only.
>   This is actually handled, for historical reasons, by having all
>   possible callers be aware of log-dirty - that son''t scale to
>   all other p2m actions, though. :(
>  - CPU B might remove the p2m mapping entirely and free the page.
>   This is the case that your suggestion handles, by holding
>   a refcount on the old MFN.
>  - CPU B might make the page read-only so it can be shared with another
>   VM.
>  - CPU B might move the MFN to another PFN (this happens with emulated
>   video RAM, I believe).
>
>
>> > I think that to fix things properly we need to have some refcount
>> > associated with the p2m mapping itself.  That would be taken by
all
>> > lookups (or at least some - we could have a flag to the p2m
lookup) and
>> > released as you suggest, but more importantly it would block all
p2m changes
>> > while the count was raised.  (I think that a least in the common
case we
>> > could encode such a refcount using the existing typecount).
>>
>> Assuming mapping means "entry in p2m", multiple mappings
would have
>> their ref count collapse in the page typecount. Isn''t that a
problem?
>
> It might be.  I think the only case where multiple p2m mappings point to
> the same MFN is page-sharing, which is already a special case; but it
> might make sense to have the refcounts per-pfn anyway, just for
> clarity.
>
>> Do we need per-mapping refcounts, or rather, per mapping mutual
>> exclusion? My feel is that page refcounts are necessary to prevent the
>> page from disappearing, and mappings need to have their lookups and
>> modifications synchronized.
>>
>> > One problem then is how to make all the callers of the p2m update
>> > functions handle failure, either by waiting (new deadlock risk?)
or
>> > returning EAGAIN at the hypercall interface.  Paths where the
update
>> > isn''t caused by an explicit request (like log-dirty and
the mem-event
>> > rx-to-rw conversion) would be particularly tricky.
>>
>> Callers already wait on lock_p2m. They''ll wait longer :)
>
> :)
>
>> On failure, to cite a specific example, if paging was trying to swap
>> something out that got foreign-mapped by somebody else, then yeah,
>> there''s no other option than failing that call.
>>
>> How would log-dirty and x^w fail (if the refcount increases before
>> they get exclusive access to the mapping)? They''re not trying
to
>> change the mapping and/or make a page go away, rather they''re
changing
>> the p2m permission.
>
> Good point.  But x^w would have to fail (or at least wait) if
> x86_emulate on another CPU was doing an instruction-fetch.  Otherwise
> there''s a race where the page becomes writeable and is written to
before
> the instruction fetch completes.
>
>> > More seriously, it introduces another round of the sort of
priority
>> > inversion we already get with the existing refcounts - a foreign
>> > mapping, caused by a user-space program in another VM, could
arbitrarily
>> > delay a p2m update (and so prevent a VCPU from making progress),
without
>> > any mechanism to even request that the mapping be removed.
>>
>> Yeah, that''s tricky. I do not know if there is a fix at all.
>> Fundamentally, the foreign mapper (some dom0 sharing/paging/foo
>> utility) is completely async to the domain. Even if we were to revoke
>> the foreign mapping as you suggest below, that would involve an upcall
>> into the foreign-mapper-guest-OS to have it cleanly neuter the mapping
>> and drop the refcount. That''s not at all trivial! Perhaps
foreign
>> mapping vma''s should be taught to patiently re-establish
mappings if
>> they disappear under their feet? Event then, you need to keep track of
>> those foreign l1e''s, and nothing short of a list will do.
>>
>> Because this is performance rather than correctness I''m
inclined to
>> not poke the beast.
>
> I''m inclined to agree.  Maybe the right thing to do is implement
it and
> see whether serious problems arise.
>
>> >> I''m more wary that turning p2m locking into
read/write will result in
>> >> code deadlocking itself: taking a read lock first and a write
lock
>> >> later. Possibly the current rwlock implementation could be
improved to
>> >> keep a cpumask of read-lockers, and provide an atomic
"promote from
>> >> read to write" atomic operation (something along the
lines of wait
>> >> until you''re the only reader in the cpumask, and then
cmpxchg(lock,
>> >> -1, WRITE_BIAS))
>> >
>> > I think that would deadlock if two cpus tried it at once.
>>
>> If you keep a cpumask of all read lockers, and only try to promote if
>> you''re the only read locker, it wouldn''t. But then
you''d have to
>> protect the cpumask from races with another spinlock. Yuk.
>
> If two CPUs both take read locks and both want to promote, they''ll
> deadlock waiting for each other to go away.
>
>> Which brings me to another question: p2m_locked_by_me (and others)
>> check on the physical cpu (lock->holder == curent->processor,
>> roughly). Well, what if you lock and then the vcpu migrates? Are vcpu
>> migrations prevented if you hold any locks? Or is there some other
>> magic going on?
>
> When that code was written, vcpus could not be preempted in the
> hypervisor (except in the scheduler softirq handler).  Now they can
> voluntarily preempt, but not while holding any locks, so it''s
still OK.
>
> Cheers,
>
> Tim.
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2011-Oct-20 10:13 UTC

head link

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

At 13:59 -0400 on 13 Oct (1318514390), Andres Lagar Cavilla
wrote:> Good stuff Tim, let me summarize:
> 
> 
> - The key is to obtain exclusive access to a p2m entry, or range [gfn,
> gfn + 1<<order). This exclusive access lasts beyond the actual
lookup,
> until the caller is finished with modifications, to prevent the p2m
> mapping changing underfoot.
Yes.  It only excludes concurrent updates, not concurrent lookups, so in
that way it''s effectively a per-range MRSW lock, implemented with
refcounts.  (I feel like I''m working around in a circle to your first
suggestion!)
> - bits for either fine-grain locks or refcounts need to be set aside.
> Stuffing those bits in actual p2m entries will be very error prone/not
> possible, given all existing implementations (NPT+IOMMU, 32bit, etc).
> So, we''re stuck with extra space overhead for a fine-grained p2m
> concurrency control structure.
Yes.
> - Unless the recount collapses into the page_info struct. Even then
> there is a critical section "get p2m_entry then get_page" that
needs
> to execute atomically.
True, and since you only get the page struct after the p2m lookup
that''s
tricky.
> - foreign mappings can block p2m actions for arbitrarily long. This
> doesn''t usually happen, but the risk is latent. This is "hard
to
> solve", for now.
Yes.
> question 1: I still don''t see the need for refcounts. If you want
to
> prevent changes underfoot, you need to lock the entry, and that''s
it.
> In all the cases you explained, somebody would have to wait until the
> refcount on the entry drops to reflect they are the only holder. This
> is akin to being locked out.
It should be possible for multiple clients to look up and use the same
p2m entry (e.g. Qemu having a mapping of a guest frame shouldn''t stop
x86_emulate from reading or writing that memory, though both of those
should stop any concurrent p2m update to the gfn).
> question 2: although internal hypervisor code paths do not seem to act
> on unaligned p2m ranges, external calls (e.g. MEMF_populate_on_demand)
> could possibly pass unaligned ranges. These complicate fine-grain
> concurrency. Should we fail those? With so many toolstacks out there,
> I feel very hesitant.
Hmm.  Most operations that touch large numbers of frames already have a
partial-success return path (or at least stop-and-retry) to avoid
long-running operations starving timers, softirqs etc.  If there are
paths that don''t do this, maybe they should. :)
> question 3: is there any way to know a priori the max gfn a domain
> will have? Can we pre-allocate the concurrency control structure as
> opposed to demand allocating it?
Not any sensible maximum, no, and gfn sapace can be sparse so it might
not make sense to allocate it all up front anyway.  But the p2m
structures themselves are allocated on demand so the extra bookkeeping
space can run alongside them.
> suggestion 1: bake exclusive access in the current calls. A p2m
> lookup, followed by a p2m set_entry, delimit a critical section for
> that range of p2m mappings. p2m lookups without closing set_entry will
> have to issue a call to drop exclusive access on the range of
> mappings.
As I said above, it shouldn''t be exclusive with other _lookups_, only
with updates.  But I have no objection to adding a flag to the lookup
function that lets the caller choose "lock for update" vs "lock
for
lookup".
> suggestion 2: limit fine granularity (if locking, not refcounting), to
> 2MB superpages. Saves space. 512 neighbours can surely coexist without
> locking each other out :)
Sure; if that turns out to cause a lot of contention it can be changed
later. 

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar Cavilla

2011-Oct-20 14:54 UTC

head link

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

On Thu, Oct 20, 2011 at 6:13 AM, Tim Deegan <tim@xen.org>
wrote:> At 13:59 -0400 on 13 Oct (1318514390), Andres Lagar Cavilla wrote:
>> Good stuff Tim, let me summarize:
>>
>>
>> - The key is to obtain exclusive access to a p2m entry, or range [gfn,
>> gfn + 1<<order). This exclusive access lasts beyond the actual
lookup,
>> until the caller is finished with modifications, to prevent the p2m
>> mapping changing underfoot.
>
> Yes.  It only excludes concurrent updates, not concurrent lookups, so in
> that way it''s effectively a per-range MRSW lock, implemented with
> refcounts.  (I feel like I''m working around in a circle to your
first
> suggestion!)
That''s great because ... I''m working around in a circle 180
degrees opposite :)

I think it''s important to untangle page liveness from mapping mutex
access. That''s why I favor Keir''s approach of just locking the
thing,
no MSRW. Reasons:
1. Very common idiom throughout the code is to get_entry -> set_entry.
How do we do that in an MSRW, atomically?
2. You''re concerned about foreign mappings, rightly so. With mutex
access to the p2m mapping, we can ensure the page refcount increases
atomically in the context of building the foreign mapping. This will
keep the page alive and unable to be swapped/shared/whatever. We only
lock the p2m entry while building the p2m mapping.
3. Recursive locking for different purposes is just easier without
refcounts (generalization of reason 1)
4. Note that in your qemu/x86_emulate example, qemu''s mapping does not
prevent x86_emulate from progress, as qemu will have relinquished
locks once done building the foreign mapping.

I have a draft implementation of a "tree" of exclusive locks.
It''s
still a bit embarrassing to share :)
The API is more or less
get_p2m(p2m, gfn, order)
<do work>
put_p2m(p2m, gfn, order)
with recursive get allowed, (unsophisticated) deadlock detection, and
shortcuts for individual gfn and for global locking (for e.g.
log_dirty). Give me a couple days for an RFC post.

Thanks
Andres
>
>> - bits for either fine-grain locks or refcounts need to be set aside.
>> Stuffing those bits in actual p2m entries will be very error prone/not
>> possible, given all existing implementations (NPT+IOMMU, 32bit, etc).
>> So, we''re stuck with extra space overhead for a fine-grained
p2m
>> concurrency control structure.
>
> Yes.
>
>> - Unless the recount collapses into the page_info struct. Even then
>> there is a critical section "get p2m_entry then get_page"
that needs
>> to execute atomically.
>
> True, and since you only get the page struct after the p2m lookup
that''s
> tricky.
>
>> - foreign mappings can block p2m actions for arbitrarily long. This
>> doesn''t usually happen, but the risk is latent. This is
"hard to
>> solve", for now.
>
> Yes.
>
>> question 1: I still don''t see the need for refcounts. If you
want to
>> prevent changes underfoot, you need to lock the entry, and
that''s it.
>> In all the cases you explained, somebody would have to wait until the
>> refcount on the entry drops to reflect they are the only holder. This
>> is akin to being locked out.
>
> It should be possible for multiple clients to look up and use the same
> p2m entry (e.g. Qemu having a mapping of a guest frame shouldn''t
stop
> x86_emulate from reading or writing that memory, though both of those
> should stop any concurrent p2m update to the gfn).
>
>> question 2: although internal hypervisor code paths do not seem to act
>> on unaligned p2m ranges, external calls (e.g. MEMF_populate_on_demand)
>> could possibly pass unaligned ranges. These complicate fine-grain
>> concurrency. Should we fail those? With so many toolstacks out there,
>> I feel very hesitant.
>
> Hmm.  Most operations that touch large numbers of frames already have a
> partial-success return path (or at least stop-and-retry) to avoid
> long-running operations starving timers, softirqs etc.  If there are
> paths that don''t do this, maybe they should. :)
>
>> question 3: is there any way to know a priori the max gfn a domain
>> will have? Can we pre-allocate the concurrency control structure as
>> opposed to demand allocating it?
>
> Not any sensible maximum, no, and gfn sapace can be sparse so it might
> not make sense to allocate it all up front anyway.  But the p2m
> structures themselves are allocated on demand so the extra bookkeeping
> space can run alongside them.
>
>> suggestion 1: bake exclusive access in the current calls. A p2m
>> lookup, followed by a p2m set_entry, delimit a critical section for
>> that range of p2m mappings. p2m lookups without closing set_entry will
>> have to issue a call to drop exclusive access on the range of
>> mappings.
>
> As I said above, it shouldn''t be exclusive with other _lookups_,
only
> with updates.  But I have no objection to adding a flag to the lookup
> function that lets the caller choose "lock for update" vs
"lock for
> lookup".
>
>> suggestion 2: limit fine granularity (if locking, not refcounting), to
>> 2MB superpages. Saves space. 512 neighbours can surely coexist without
>> locking each other out :)
>
> Sure; if that turns out to cause a lot of contention it can be changed
> later.
>
> Cheers,
>
> Tim.
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2011-Oct-20 15:31 UTC

head link

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

At 10:54 -0400 on 20 Oct (1319108045), Andres Lagar Cavilla
wrote:> On Thu, Oct 20, 2011 at 6:13 AM, Tim Deegan <tim@xen.org> wrote:
> > At 13:59 -0400 on 13 Oct (1318514390), Andres Lagar Cavilla wrote:
> >> Good stuff Tim, let me summarize:
> >>
> >>
> >> - The key is to obtain exclusive access to a p2m entry, or range
[gfn,
> >> gfn + 1<<order). This exclusive access lasts beyond the
actual lookup,
> >> until the caller is finished with modifications, to prevent the
p2m
> >> mapping changing underfoot.
> >
> > Yes.  It only excludes concurrent updates, not concurrent lookups, so
in
> > that way it''s effectively a per-range MRSW lock, implemented
with
> > refcounts.  (I feel like I''m working around in a circle to
your first
> > suggestion!)
> 
> That''s great because ... I''m working around in a circle
180 degrees opposite :)
> 
> I think it''s important to untangle page liveness from mapping
mutex
> access. That''s why I favor Keir''s approach of just
locking the thing,
> no MSRW. Reasons:
> 1. Very common idiom throughout the code is to get_entry -> set_entry.
> How do we do that in an MSRW, atomically?
Callers that want to do get/set would have to acquire the write lock
during the first lookup.
> 2. You''re concerned about foreign mappings, rightly so. With mutex
> access to the p2m mapping, we can ensure the page refcount increases
> atomically in the context of building the foreign mapping. This will
> keep the page alive and unable to be swapped/shared/whatever. We only
> lock the p2m entry while building the p2m mapping.
Ah - that''s where we differ.  As far as I can see, in order to be
effective, the p2m entry must remain locked until the foreign mapping is
destroyed.  Otherwise, later p2m updates can make the mapping stale.
> 3. Recursive locking for different purposes is just easier without
> refcounts (generalization of reason 1)
> 4. Note that in your qemu/x86_emulate example, qemu''s mapping does
not
> prevent x86_emulate from progress, as qemu will have relinquished
> locks once done building the foreign mapping.
> 
> I have a draft implementation of a "tree" of exclusive locks.
It''s
> still a bit embarrassing to share :)
> The API is more or less
> get_p2m(p2m, gfn, order)
> <do work>
> put_p2m(p2m, gfn, order)
> with recursive get allowed, (unsophisticated) deadlock detection, and
> shortcuts for individual gfn and for global locking (for e.g.
> log_dirty). Give me a couple days for an RFC post.
Great - actual code is always welcome! :)  I might not be able to give
it much attention before next Thursday though.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Sep 2011 - mapping problems in xenpaging

[Xen-devel] mapping problems in xenpaging

[Xen-devel] Re: mapping problems in xenpaging

Re: [Xen-devel] Re: mapping problems in xenpaging

Re: [Xen-devel] Re: mapping problems in xenpaging

[Xen-devel] Re: mapping problems in xenpaging

Re: [Xen-devel] Re: mapping problems in xenpaging

Re: [Xen-devel] Re: mapping problems in xenpaging

Re: [Xen-devel] Re: mapping problems in xenpaging

[Xen-devel] Re: Re: mapping problems in xenpaging

[Xen-devel] Re: Re: mapping problems in xenpaging

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

[Xen-devel] Re: Re: mapping problems in xenpaging

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

Re: [Xen-devel] Re: Re: mapping problems in xenpaging

Re: [Xen-devel] Re: Re: mapping problems in xenpaging