Hi Grzegorz, As I understand, the purpose of the code in page_make_sharable() checking the ref count is to ensure that nobody unexpected is working on the page, and so we can migrate it to dom_cow, right? === /* Check if the ref count is 2. The first from PGT_allocated, and * the second from get_page_and_type at the top of this function */ if(page->count_info != (PGC_allocated | (2 + expected_refcnt))) { /* Return type count back to zero */ put_page_and_type(page); spin_unlock(&d->page_alloc_lock); return -E2BIG; } === However, it seems to me that this ref check and the following page migration is not atomic( although the operations for type_info ref check seems good) i.e. it''s possible that it passed this ref check but just before it goes to dom_cow, someone else gets this page? As far as I know, in Linux kernel''s similar scenarios, they do ref-freezing tricks. And if we don''t need to worry about this racing case, then what is this check trying to ensure? Thanks, Nai
Hi, As I understand, the purpose of the code in page_make_sharable() checking the ref count is to ensure that nobody unexpected is working on the page, and so we can migrate it to dom_cow, right? === /* Check if the ref count is 2. The first from PGT_allocated, and * the second from get_page_and_type at the top of this function */ if(page->count_info != (PGC_allocated | (2 + expected_refcnt))) { /* Return type count back to zero */ put_page_and_type(page); spin_unlock(&d->page_alloc_lock); return -E2BIG; } === However, it seems to me that this ref check and the following page migration is not atomic( although the operations for type_info ref check seems good) i.e. it''s possible that it passed this ref check but just before it goes to dom_cow, someone else gets this page? As far as I know, in Linux kernel''s similar scenarios, they do ref-freezing tricks. And if we don''t need to worry about this racing case, then what is this check trying to ensure? Thanks, Nai
At 22:43 +0800 on 16 Jan (1326753834), Nai Xia wrote:> Hi Grzegorz, > > As I understand, the purpose of the code in page_make_sharable() > checking the ref count is to ensure that nobody unexpected is working > on the page, and so we can migrate it to dom_cow, right? > > ===> /* Check if the ref count is 2. The first from PGT_allocated, and > * the second from get_page_and_type at the top of this function */ > if(page->count_info != (PGC_allocated | (2 + expected_refcnt))) > { > /* Return type count back to zero */ > put_page_and_type(page); > spin_unlock(&d->page_alloc_lock); > return -E2BIG; > } > ===> > However, it seems to me that this ref check and the following page > migration is not atomic( although the operations for type_info ref > check seems good) i.e. it''s possible that it passed this ref > check but just before it goes to dom_cow, someone else gets this page?Yes, there are a number of races around the p2m code; Anders Lagar-Cavilla has been working on fixing the locking around p2m lookups to take care of this. Tim.
Hi all, Last time I reported this bug. And I now see some changes in your Xen git master branch. However, I think the problem still remains for ref-checking in page migration to dom_cow. I think I can construct a bug by interleaving the two code paths: in guest_remove_page() | in page_make_sharable() ------------------------------------------------------------------------------------------------------------------------------ if ( p2m_is_shared(p2mt) ) ..... ... ..... page = mfn_to_page(mfn); ..... ..... if ( !get_page_and_type(page, d, PGT_shared_page) ) // success ......... if ( page->count_info != (PGC_allocated | (2 + expected_refcnt)) ) // also pass if ( unlikely(!get_page(page, d)) ) /* go on to remove page */ /* go on to add page to cow domain */ ------------------------------------------------------------------------------------------------------------------------------------- is there anything that can already prevent such racing or is this really can happen? Thanks, Nai Xia On 2012年01月17日 18:53, Tim Deegan wrote:> At 22:43 +0800 on 16 Jan (1326753834), Nai Xia wrote: >> Hi Grzegorz, >> >> As I understand, the purpose of the code in page_make_sharable() >> checking the ref count is to ensure that nobody unexpected is working >> on the page, and so we can migrate it to dom_cow, right? >> >> ===>> /* Check if the ref count is 2. The first from PGT_allocated, and >> * the second from get_page_and_type at the top of this function */ >> if(page->count_info != (PGC_allocated | (2 + expected_refcnt))) >> { >> /* Return type count back to zero */ >> put_page_and_type(page); >> spin_unlock(&d->page_alloc_lock); >> return -E2BIG; >> } >> ===>> >> However, it seems to me that this ref check and the following page >> migration is not atomic( although the operations for type_info ref >> check seems good) i.e. it's possible that it passed this ref >> check but just before it goes to dom_cow, someone else gets this page? > > Yes, there are a number of races around the p2m code; Anders > Lagar-Cavilla has been working on fixing the locking around p2m lookups > to take care of this. > > Tim. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Hi, At 23:35 +0800 on 27 Dec (1356651327), Nai Xia wrote:> I think I can construct a bug by interleaving the two code paths: > > in guest_remove_page() | in page_make_sharable() > ------------------------------------------------------------------------------------------------------------------------------ > if ( p2m_is_shared(p2mt) ) ..... > ... ..... > page = mfn_to_page(mfn); ..... > ..... > > if ( > !get_page_and_type(page, > d, PGT_shared_page) ) > // success > > ......... > if ( page->count_info != > (PGC_allocated | (2 + > expected_refcnt)) ) // > also pass > > > if ( unlikely(!get_page(page, d)) ) > > /* go on to remove page */ /* go on to add page to > cow domain */ > ------------------------------------------------------------------------------------------------------------------------------------- > > > is there anything that can already prevent such racing or is this really > can happen?I think this race can happen. I''m not sure exactly what the effect is, though. I guess the page ends up belonging to dom_cow, but without the PGC_allocated bit set. So when it becomes unshared again, it''s immediately freed. :( Andres, what do you think? Tim.
Andres Lagar-Cavilla
2013-Jan-10 16:36 UTC
Re: Is this a racing bug in page_make_sharable()?
Hi there, thanks for the report. Sorry I didn''t respond earlier, fell through the cracks. Having said that... On Jan 10, 2013, at 8:00 AM, Tim Deegan <tim@xen.org> wrote:> Hi, > > At 23:35 +0800 on 27 Dec (1356651327), Nai Xia wrote: >> I think I can construct a bug by interleaving the two code paths: >> >> in guest_remove_page() | in page_make_sharable() >> ------------------------------------------------------------------------------------------------------------------------------ >> if ( p2m_is_shared(p2mt) ) ..... >> ... ..... >> page = mfn_to_page(mfn); ..... >> ..... >> >> if ( >> !get_page_and_type(page, >> d, PGT_shared_page) ) >> // success >> >> ......... >> if ( page->count_info != >> (PGC_allocated | (2 + >> expected_refcnt)) ) // >> also pass >> >> >> if ( unlikely(!get_page(page, d)) ) >> >> /* go on to remove page */ /* go on to add page to >> cow domain */ >> ------------------------------------------------------------------------------------------------------------------------------------- >> >> >> is there anything that can already prevent such racing or is this really >> can happen? > > I think this race can happen.I beg to disagree. Both page_make_sharable and page_make_private are protected by the per-mfn page lock (see __grab_shared_page). This lock will serialize strictly page_make_sharable against the page_make_private in quest_remove_page -> mem_sharing_unshare_page -> page_make_private. So page_make_sharable will execute atomically, add to cow, and then unshare will come in, make the page private again, remove from cow and hand over to the domain for whom the page is being removed. Or, the page has other shared references in which case page_make_sharable is a no-op, and page_make_private will generate a new different page (mfn). Or, and this is the tricky case you refer to, page_make_private will complete before page_make_sharable kicks in. The question then is: how did page_make_sharable even got to be called? The mfn is being nominated for sharing. How? Through a p2m entry in a domain. Is this the same domain as the one for which quest_remove_page is executing? Then all is serialized through the p2m lock, no race. Is this a different domain? Then the page is already shared, no race as per above. Finally, is this a different domain and the page is not shared? This Should Not Be! As in, never happens with the current APIs, and IMHO is borderline uncheckable for. Hope that helps, Andres> I''m not sure exactly what the effect > is, though. I guess the page ends up belonging to dom_cow, but > without the PGC_allocated bit set. So when it becomes unshared again, > it''s immediately freed. :( > > Andres, what do you think? > > Tim.
At 11:36 -0500 on 10 Jan (1357817806), Andres Lagar-Cavilla wrote:> Hi there, thanks for the report. Sorry I didn''t respond earlier, fell through the cracks. > > Having said that... > On Jan 10, 2013, at 8:00 AM, Tim Deegan <tim@xen.org> wrote: > > > Hi, > > > > At 23:35 +0800 on 27 Dec (1356651327), Nai Xia wrote: > >> I think I can construct a bug by interleaving the two code paths: > >> > >> in guest_remove_page() | in page_make_sharable() > >> ------------------------------------------------------------------------------------------------------------------------------ > >> if ( p2m_is_shared(p2mt) ) ..... > >> ... ..... > >> page = mfn_to_page(mfn); ..... > >> ..... > >> > >> if ( > >> !get_page_and_type(page, > >> d, PGT_shared_page) ) > >> // success > >> > >> ......... > >> if ( page->count_info != > >> (PGC_allocated | (2 + > >> expected_refcnt)) ) // > >> also pass > >> > >> > >> if ( unlikely(!get_page(page, d)) ) > >> > >> /* go on to remove page */ /* go on to add page to > >> cow domain */ > >> ------------------------------------------------------------------------------------------------------------------------------------- > >> > >> > >> is there anything that can already prevent such racing or is this really > >> can happen? > > > > I think this race can happen. > > Through a p2m entry in a domain. Is this the same domain as the one > for which quest_remove_page is executing? Then all is serialized > through the p2m lock, no race.Right, that''s what I was missing. Because guest_remove_page has called get_gfn_query on the gfn, and not yet called put_gfn(), page_make_sharable() can''t be running. All is well. :) Thanks, Tim.
On 2013年01月11日 01:25, Tim Deegan wrote:> At 11:36 -0500 on 10 Jan (1357817806), Andres Lagar-Cavilla wrote: >> Hi there, thanks for the report. Sorry I didn't respond earlier, fell through the cracks. >> >> Having said that... >> On Jan 10, 2013, at 8:00 AM, Tim Deegan <tim@xen.org> wrote: >> >>> Hi, >>> >>> At 23:35 +0800 on 27 Dec (1356651327), Nai Xia wrote: >>>> I think I can construct a bug by interleaving the two code paths: >>>> >>>> in guest_remove_page() | in page_make_sharable() >>>> ------------------------------------------------------------------------------------------------------------------------------ >>>> if ( p2m_is_shared(p2mt) ) ..... >>>> ... ..... >>>> page = mfn_to_page(mfn); ..... >>>> ..... >>>> >>>> if ( >>>> !get_page_and_type(page, >>>> d, PGT_shared_page) ) >>>> // success >>>> >>>> ......... >>>> if ( page->count_info !>>>> (PGC_allocated | (2 + >>>> expected_refcnt)) ) // >>>> also pass >>>> >>>> >>>> if ( unlikely(!get_page(page, d)) ) >>>> >>>> /* go on to remove page */ /* go on to add page to >>>> cow domain */ >>>> ------------------------------------------------------------------------------------------------------------------------------------- >>>> >>>> >>>> is there anything that can already prevent such racing or is this really >>>> can happen? >>> >>> I think this race can happen. >> >> Through a p2m entry in a domain. Is this the same domain as the one >> for which quest_remove_page is executing? Then all is serialized >> through the p2m lock, no race. > > Right, that's what I was missing. Because guest_remove_page has called > get_gfn_query on the gfn, and not yet called put_gfn(), > page_make_sharable() can't be running. All is well. :)I see. Good to see everything working well. :) Thanks, Nai> > Thanks, > > Tim. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel