The preemptable page type handling changes modified free_page_type() behavior without adjusting the call site in relinquish_memory(): Any type reference left pending when leaving hypercall handlers is associated with a page reference, and when successful free_page_type() decrements the type refcount - hence relinquish_memory() must now also drop the page reference. Also, the recursion avoidance during domain shutdown somehow (probably by me when I merged the patch up to a newer snapshot) got screwed up: The avoidance logic in mm.c should short circuit levels below the top one currently being processed, rather than the top one itself. Signed-off-by: Jan Beulich <jbeulich@novell.com> Index: 2008-10-27/xen/arch/x86/domain.c ==================================================================--- 2008-10-27.orig/xen/arch/x86/domain.c 2008-10-24 11:21:38.000000000 +0200 +++ 2008-10-27/xen/arch/x86/domain.c 2008-10-27 11:11:36.000000000 +0100 @@ -1687,6 +1687,7 @@ static int relinquish_memory( { if ( free_page_type(page, x, 0) != 0 ) BUG(); + put_page(page); break; } } Index: 2008-10-27/xen/arch/x86/mm.c ==================================================================--- 2008-10-27.orig/xen/arch/x86/mm.c 2008-10-17 08:29:02.000000000 +0200 +++ 2008-10-27/xen/arch/x86/mm.c 2008-10-27 11:11:36.000000000 +0100 @@ -1343,7 +1343,7 @@ static void free_l1_table(struct page_in static int free_l2_table(struct page_info *page, int preemptible) { -#ifdef CONFIG_COMPAT +#if defined(CONFIG_COMPAT) || defined(DOMAIN_DESTRUCT_AVOID_RECURSION) struct domain *d = page_get_owner(page); #endif unsigned long pfn = page_to_mfn(page); @@ -1351,6 +1351,11 @@ static int free_l2_table(struct page_inf unsigned int i = page->nr_validated_ptes - 1; int err = 0; +#ifdef DOMAIN_DESTRUCT_AVOID_RECURSION + if ( d->arch.relmem == RELMEM_l3 ) + return 0; +#endif + pl2e = map_domain_page(pfn); ASSERT(page->nr_validated_ptes); @@ -1381,7 +1386,7 @@ static int free_l3_table(struct page_inf int rc = 0; #ifdef DOMAIN_DESTRUCT_AVOID_RECURSION - if ( d->arch.relmem == RELMEM_l3 ) + if ( d->arch.relmem == RELMEM_l4 ) return 0; #endif @@ -1424,11 +1429,6 @@ static int free_l4_table(struct page_inf unsigned int i = page->nr_validated_ptes - !page->partial_pte; int rc = 0; -#ifdef DOMAIN_DESTRUCT_AVOID_RECURSION - if ( d->arch.relmem == RELMEM_l4 ) - return 0; -#endif - do { if ( is_guest_l4_slot(d, i) ) rc = put_page_from_l4e(pl4e[i], pfn, preemptible); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> "Jan Beulich" <jbeulich@novell.com> 27.10.08 12:36 >>> >The preemptable page type handling changes modified free_page_type() >behavior without adjusting the call site in relinquish_memory(): Any >type reference left pending when leaving hypercall handlers is >associated with a page reference, and when successful free_page_type() >decrements the type refcount - hence relinquish_memory() must now also >drop the page reference.After some more thinking, especially in the context of another bug (see below), I''m not certain anymore this is the right thing to do, even though the explanation continues to seem plausible to me. One part of my concern is that, when the issue this patch attempted to fix is seen, the code path gets entered only because of DOMAIN_DESTRUCT_AVOID_RECURSION, not because of either of the conditions mentioned in the preceding comment. Hence I''m worried that especially for circular ''linear page table'' references this may be wrong (but I don''t really know how to properly distinguish this case from the DOMAIN_DESTRUCT_AVOID_RECURSION leftovers). As to the PGT_partial case - with the current way this works, the put_page() seems to be wrong in any case. However, there''s a second (independent) bug in that it currently is possible during domain cleanup to encounter pages with PGT_partial set, their type reference count being 1, but their page reference count being 0. This is certainly wrong (it triggers the BUG_ON() in free_domheap_pages()). As PGT_partial is being handled so that the type refcount is left at 1, I think it must imply to also keep the respective page reference. Doing this at the call sites (i.e. be suppressing the put_page()) seems cumbersome, though, so I''m rather considering obtaining an extra reference when PGT_partial gets set, and dropping it when PGT_partial gets cleared. With that, the put_page() mentioned above would be correct again (as PGT_partial *is* being cleared there). Btw., the other part of that patch (i.e. the changes to xen/arch/x86/mm.c) should probably also go into the 3.3 tree. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 08:22, "Jan Beulich" <jbeulich@novell.com> wrote:>>>> "Jan Beulich" <jbeulich@novell.com> 27.10.08 12:36 >>> >> The preemptable page type handling changes modified free_page_type() >> behavior without adjusting the call site in relinquish_memory(): Any >> type reference left pending when leaving hypercall handlers is >> associated with a page reference, and when successful free_page_type() >> decrements the type refcount - hence relinquish_memory() must now also >> drop the page reference. > > After some more thinking, especially in the context of another bug (see > below), I''m not certain anymore this is the right thing to do, even though > the explanation continues to seem plausible to me. One part of my concern > is that, when the issue this patch attempted to fix is seen, the code path > gets entered only because of DOMAIN_DESTRUCT_AVOID_RECURSION, not > because of either of the conditions mentioned in the preceding comment. > Hence I''m worried that especially for circular ''linear page table'' references > this may be wrong (but I don''t really know how to properly distinguish this > case from the DOMAIN_DESTRUCT_AVOID_RECURSION leftovers).Oh I see. Well it''s certainly bogus for circular references. Let''s say A refers to B and vice versa. Then free_page_type(A) will cause put_page_and_type(B) -> free_page_type(B) -> put_page_and_type(A) [this last function will note that A''s PGT_validated is clear and hence will not reenter free_page_type for A]. So the put_page() you added is not the correct fix for whetever issue you''ve been seeing. Raising the reference count on PGT_validated is certainly a possibility... That won''t make the put_page() in the circular-reference destructor correct though. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir.fraser@eu.citrix.com> 28.10.08 09:32 >>> >On 28/10/08 08:22, "Jan Beulich" <jbeulich@novell.com> wrote: > >>>>> "Jan Beulich" <jbeulich@novell.com> 27.10.08 12:36 >>> >>> The preemptable page type handling changes modified free_page_type() >>> behavior without adjusting the call site in relinquish_memory(): Any >>> type reference left pending when leaving hypercall handlers is >>> associated with a page reference, and when successful free_page_type() >>> decrements the type refcount - hence relinquish_memory() must now also >>> drop the page reference. >> >> After some more thinking, especially in the context of another bug (see >> below), I''m not certain anymore this is the right thing to do, even though >> the explanation continues to seem plausible to me. One part of my concern >> is that, when the issue this patch attempted to fix is seen, the code path >> gets entered only because of DOMAIN_DESTRUCT_AVOID_RECURSION, not >> because of either of the conditions mentioned in the preceding comment. >> Hence I''m worried that especially for circular ''linear page table'' references >> this may be wrong (but I don''t really know how to properly distinguish this >> case from the DOMAIN_DESTRUCT_AVOID_RECURSION leftovers). > >Oh I see. Well it''s certainly bogus for circular references. Let''s say A >refers to B and vice versa. Then free_page_type(A) will cause >put_page_and_type(B) -> free_page_type(B) -> put_page_and_type(A) [this last >function will note that A''s PGT_validated is clear and hence will not >reenter free_page_type for A]. > >So the put_page() you added is not the correct fix for whetever issue you''ve >been seeing. Raising the reference count on PGT_validated is certainly a >possibility... That won''t make the put_page() in the circular-reference >destructor correct though. :-)But how would one distinguish the two (or three at present, due to DOMAIN_DESTRUCT_AVOID_RECURSION) cases? Also, where is the matching put_page() for the type refcount decrement in free_page_type() in the circular case (in free_page_type(A) -> put_page_and_type(B) -> free_page_type(B) -> put_page_and_type(A) we''ll have the type refcount decremented twice, but the page refcount just once)? Or is this decrement invalid in that case (I don''t think it is, as get_lN_linear_pagetable() increments it along with keeping the page reference it obtained in the success case, but if it is, it again poses the question of how to recognize that case)? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 09:26, "Jan Beulich" <jbeulich@novell.com> wrote:> But how would one distinguish the two (or three at present, due to > DOMAIN_DESTRUCT_AVOID_RECURSION) cases?I''m not sure. Do they need to be distinguished?> Also, where is the > matching put_page() for the type refcount decrement in free_page_type() > in the circular case (in free_page_type(A) -> put_page_and_type(B) -> > free_page_type(B) -> put_page_and_type(A) we''ll have the type refcount > decremented twice, but the page refcount just once)? Or is this decrement > invalid in that case (I don''t think it is, as get_lN_linear_pagetable() > increments it along with keeping the page reference it obtained in the > success case, but if it is, it again poses the question of how to recognize > that case)?Ah, looks like it''s been broken since the preemptible page_type patch went in. Perhaps the tail of free_page_type() should go into __put_page_type(), as it''s not needed by the call site in relinquish_memory(): the caller doesn''t really hold a type reference to be dropped; and the logic for being preempted doesn''t apply since relinquish_memory() requests no preemption. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir.fraser@eu.citrix.com> 28.10.08 10:36 >>> >On 28/10/08 09:26, "Jan Beulich" <jbeulich@novell.com> wrote: > >> But how would one distinguish the two (or three at present, due to >> DOMAIN_DESTRUCT_AVOID_RECURSION) cases? > >I''m not sure. Do they need to be distinguished?At least the PGT_partial case needs to be, especially if we agree that leaving a type reference without a page reference pending in that case is broken. Luckily, that case *is* easily recognizable.>> Also, where is the >> matching put_page() for the type refcount decrement in free_page_type() >> in the circular case (in free_page_type(A) -> put_page_and_type(B) -> >> free_page_type(B) -> put_page_and_type(A) we''ll have the type refcount >> decremented twice, but the page refcount just once)? Or is this decrement >> invalid in that case (I don''t think it is, as get_lN_linear_pagetable() >> increments it along with keeping the page reference it obtained in the >> success case, but if it is, it again poses the question of how to recognize >> that case)? > >Ah, looks like it''s been broken since the preemptible page_type patch went >in. Perhaps the tail of free_page_type() should go into __put_page_type(), >as it''s not needed by the call site in relinquish_memory(): the caller >doesn''t really hold a type reference to be dropped; and the logic for being >preempted doesn''t apply since relinquish_memory() requests no preemption.It doesn''t at present, but it should (in favor of DOMAIN_DESTRUCT_AVOID_RECURSION), including the put_page_and_type() earlier in that function. But of course, it may still turn out that cleaning up after preemption here must be handled differently from the __put_page_type() case. I''ll give moving that part (and removing the put_page() added yesterday) a try. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 10:05, "Jan Beulich" <jbeulich@novell.com> wrote:>> Ah, looks like it''s been broken since the preemptible page_type patch went >> in. Perhaps the tail of free_page_type() should go into __put_page_type(), >> as it''s not needed by the call site in relinquish_memory(): the caller >> doesn''t really hold a type reference to be dropped; and the logic for being >> preempted doesn''t apply since relinquish_memory() requests no preemption. > > It doesn''t at present, but it should (in favor of > DOMAIN_DESTRUCT_AVOID_RECURSION), > including the put_page_and_type() earlier in that function. But of course, > it may still turn out that cleaning up after preemption here must be handled > differently from the __put_page_type() case. I''ll give moving that part > (and removing the put_page() added yesterday) a try.__put_page_type() is already a complex function actually, so let''s define a __put_final_page_type() containing a call to free_page_type() plus the current tail of free_page_type(). __put_page_type() can call that; relinquish memory can call free_page_type() directly. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir.fraser@eu.citrix.com> 28.10.08 11:25 >>> >On 28/10/08 10:05, "Jan Beulich" <jbeulich@novell.com> wrote: > >>> Ah, looks like it''s been broken since the preemptible page_type patch went >>> in. Perhaps the tail of free_page_type() should go into __put_page_type(), >>> as it''s not needed by the call site in relinquish_memory(): the caller >>> doesn''t really hold a type reference to be dropped; and the logic for being >>> preempted doesn''t apply since relinquish_memory() requests no preemption. >> >> It doesn''t at present, but it should (in favor of >> DOMAIN_DESTRUCT_AVOID_RECURSION), >> including the put_page_and_type() earlier in that function. But of course, >> it may still turn out that cleaning up after preemption here must be handled >> differently from the __put_page_type() case. I''ll give moving that part >> (and removing the put_page() added yesterday) a try. > >__put_page_type() is already a complex function actually, so let''s define a >__put_final_page_type() containing a call to free_page_type() plus the >current tail of free_page_type(). __put_page_type() can call that; >relinquish memory can call free_page_type() directly.Will do it that way for submission. In testing it with that code inlined in __put_page_type(), I can confirm that this closes the memory leak, but it (obviously) doesn''t address the crash when encountering a PGT_partial page hanging off of a page table being cleaned up by that explicit call to free_page_type() getting executed as a side effect of DOMAIN_DESTRUCT_AVOID_RECURSION. The question of course really is whether it''s worthwhile trying to fix that, or rather to do away with it altogether by utilizing the ''real'' preemption. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 10:37, "Jan Beulich" <jbeulich@novell.com> wrote:> __put_page_type() is already a complex function actually, so let''s define a >> __put_final_page_type() containing a call to free_page_type() plus the >> current tail of free_page_type(). __put_page_type() can call that; >> relinquish memory can call free_page_type() directly. > > Will do it that way for submission. In testing it with that code inlined in > __put_page_type(), I can confirm that this closes the memory leak, but > it (obviously) doesn''t address the crash when encountering a PGT_partial > page hanging off of a page table being cleaned up by that explicit call > to free_page_type() getting executed as a side effect of > DOMAIN_DESTRUCT_AVOID_RECURSION. The question of course really > is whether it''s worthwhile trying to fix that, or rather to do away with it > altogether by utilizing the ''real'' preemption.I don''t actually understand the AVOID_RECURSION logic in relinquish_memory(). I''d be delighted to get rid of it. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 10:49, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> Will do it that way for submission. In testing it with that code inlined in >> __put_page_type(), I can confirm that this closes the memory leak, but >> it (obviously) doesn''t address the crash when encountering a PGT_partial >> page hanging off of a page table being cleaned up by that explicit call >> to free_page_type() getting executed as a side effect of >> DOMAIN_DESTRUCT_AVOID_RECURSION. The question of course really >> is whether it''s worthwhile trying to fix that, or rather to do away with it >> altogether by utilizing the ''real'' preemption. > > I don''t actually understand the AVOID_RECURSION logic in > relinquish_memory(). I''d be delighted to get rid of it.Actually I''m not sure how PGT_partial reference counting works either. Do such pages hold a general reference count? It appears not, so I don''t see why for example we couldn''t have the put_page() in get_page_and_type_from_pagenr() cause free_domheap_pages() and then BUG_ON() the fact that the type count is non-zero (because it is partially validated). Shouldn''t free_domheap_pages() do cleanup for partially validated pages? Is it possible or necessary to tell the difference between a page that is partially alloc_page_type()d versus one which is partially free_page_type()d? (e.g., are the reference counts left in different states in each case, and does that matter?) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir.fraser@eu.citrix.com> 28.10.08 12:15 >>> >Actually I''m not sure how PGT_partial reference counting works either. Do >such pages hold a general reference count? It appears not, so I don''t see >why for example we couldn''t have the put_page() in >get_page_and_type_from_pagenr() cause free_domheap_pages() and then BUG_ON() >the fact that the type count is non-zero (because it is partially >validated).Correct, that''s the second bug I referred to. I think that we have to retain a general reference for a partially validated page.>Shouldn''t free_domheap_pages() do cleanup for partially validated pages?I don''t think that would be the right place. This should be naturally done in the normal code paths, by ensuring (which I think we do) to always check for PGT_partial along with PGT_validated.>Is it possible or necessary to tell the difference between a page that is >partially alloc_page_type()d versus one which is partially >free_page_type()d? (e.g., are the reference counts left in different states >in each case, and does that matter?)I think it''s unnecessary to distinguish these, as long as the page''s state doesn''t differ when getting its PGT_partial flag set either way. And I believe that state is consistent at present (apart from the missing page reference, which however is consistently missing in both cases). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 11:43, "Jan Beulich" <jbeulich@novell.com> wrote:>>>> Keir Fraser <keir.fraser@eu.citrix.com> 28.10.08 12:15 >>> >> Actually I''m not sure how PGT_partial reference counting works either. Do >> such pages hold a general reference count? It appears not, so I don''t see >> why for example we couldn''t have the put_page() in >> get_page_and_type_from_pagenr() cause free_domheap_pages() and then BUG_ON() >> the fact that the type count is non-zero (because it is partially >> validated). > > Correct, that''s the second bug I referred to. I think that we have to retain > a general reference for a partially validated page.And who will ultimately destroy that reference if we e.g., xm destroy a guest? Are you hoping the circular-reference destruction logic will deal with this case also? Seems to me there would be a difference between preempted get_page_type() and preempted put_page_type() here -- in the latter case there is still a holder of the outstanding general reference count (the holder who tried to do the put_page_type() which got preempted) whereas in the former case the general reference count is not really held by anyone and so relinquish_memory() would indeed have to dispose of it? Basically messing with the general reference count seems pretty dangerous to me. I would have been inclined to put some destruction in free_domheap_pages() (although of course then that function also becomes potentially preemptible in future!). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir.fraser@eu.citrix.com> 28.10.08 13:06 >>> >And who will ultimately destroy that reference if we e.g., xm destroy a >guest? Are you hoping the circular-reference destruction logic will deal >with this case also? Seems to me there would be a difference between >preempted get_page_type() and preempted put_page_type() here -- in the >latter case there is still a holder of the outstanding general reference >count (the holder who tried to do the put_page_type() which got preempted) >whereas in the former case the general reference count is not really held by >anyone and so relinquish_memory() would indeed have to dispose of it?If we set the simple rule of "Whoever sets PGT_partial must make sure he leaves an extra ref pending, and whoever clears PGT_partial must drop that reference", we should be fine in my opinion. Also, a preempted get_page_and_type_from_pagenr() drops the general reference, and a preempted put_page_and_type() retains it, so to me the page state seems to be consistent between the two. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 12:47, "Jan Beulich" <jbeulich@novell.com> wrote:> If we set the simple rule of "Whoever sets PGT_partial must make sure > he leaves an extra ref pending, and whoever clears PGT_partial must drop > that reference", we should be fine in my opinion.Hmmm... you could be right. It sounds better than what we have just now at least. :-) -- Keir> Also, a preempted get_page_and_type_from_pagenr() drops the general > reference, and a preempted put_page_and_type() retains it, so to me > the page state seems to be consistent between the two._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir.fraser@eu.citrix.com> 28.10.08 11:25 >>> >On 28/10/08 10:05, "Jan Beulich" <jbeulich@novell.com> wrote: > >>> Ah, looks like it''s been broken since the preemptible page_type patch went >>> in. Perhaps the tail of free_page_type() should go into __put_page_type(), >>> as it''s not needed by the call site in relinquish_memory(): the caller >>> doesn''t really hold a type reference to be dropped; and the logic for being >>> preempted doesn''t apply since relinquish_memory() requests no preemption. >> >> It doesn''t at present, but it should (in favor of >> DOMAIN_DESTRUCT_AVOID_RECURSION), >> including the put_page_and_type() earlier in that function. But of course, >> it may still turn out that cleaning up after preemption here must be handled >> differently from the __put_page_type() case. I''ll give moving that part >> (and removing the put_page() added yesterday) a try.Okay, seems like it finally starts to work - it turned out that indeed we need to distinguish whether a page was partially allocated or partially freed: Partial allocation does not grab a general page reference, while partial free doesn''t drop it. My test box now survived 8 instances (out of 100 VM destroys) where the VM got killed with a partially allocated page (and also without leaking memory), but I surely want to run it for some more time, especially also after removing all the debugging output I had to add to figure what''s going on (which makes the timing all different, but luckily apparently increased the likelihood of running into the problem). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel