Jan Beulich
2010-Mar-24 11:07 UTC
[Xen-devel] [PATCH] x86: fix improper return value from relinquish_memory()
While apparently only a theoretical possibility (domain_kill() has a BUG_ON() that wasn''t reported to trigger so far), I still think it is better to have the code cleaned up. Signed-off-by: Jan Beulich <jbeulich@novell.com> --- 2010-03-22.orig/xen/arch/x86/domain.c 2010-03-22 00:00:00.000000000 +0100 +++ 2010-03-22/xen/arch/x86/domain.c 2010-03-24 11:57:49.000000000 +0100 @@ -1821,8 +1821,10 @@ static int relinquish_memory( { case 0: break; - case -EAGAIN: case -EINTR: + ret = -EAGAIN; + /* fallthrough */ + case -EAGAIN: page_list_add(page, list); set_bit(_PGT_pinned, &page->u.inuse.type_info); put_page(page); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Mar-24 11:23 UTC
Re: [Xen-devel] [PATCH] x86: fix improper return value from relinquish_memory()
>>> "Jan Beulich" <JBeulich@novell.com> 24.03.10 12:07 >>> >While apparently only a theoretical possibility (domain_kill() has a >BUG_ON() that wasn''t reported to trigger so far), I still think it is >better to have the code cleaned up.Btw., the reason I was looking at that code was that we observe zombie domains - ones in DOMDYING_dead state, perhaps having almost none of their memory freed (shadowed guests appear to be particularly bad). In one of the reports, an interesting extra fact was that this happened only for the first 100 guests - any subsequent ones got destroyed properly (obviously to get there this requires quite a bit of memory in the host). Has anyone else observed this? Does this ring any bells? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Mar-24 14:43 UTC
Re: [Xen-devel] [PATCH] x86: fix improper return value from relinquish_memory()
On Wed, Mar 24, 2010 at 11:23:32AM +0000, Jan Beulich wrote:> >>> "Jan Beulich" <JBeulich@novell.com> 24.03.10 12:07 >>> > >While apparently only a theoretical possibility (domain_kill() has a > >BUG_ON() that wasn''t reported to trigger so far), I still think it is > >better to have the code cleaned up. > > Btw., the reason I was looking at that code was that we observe > zombie domains - ones in DOMDYING_dead state, perhaps having > almost none of their memory freed (shadowed guests appear to be > particularly bad). In one of the reports, an interesting extra fact > was that this happened only for the first 100 guests - any > subsequent ones got destroyed properly (obviously to get there > this requires quite a bit of memory in the host). Has anyone else > observed this? Does this ring any bells?Yes. http://lists.xensource.com/archives/html/xen-devel/2008-12/msg00222.html B/c of the page count we had guests that would never have their mmap count removed causing them to be zombie guests. Our fix, which wasn''t nice, was to have the guest domain id re-number and shove it and its remaining page ownership (at that point it only has some pages in Dom0 and DomU) in a corner. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Mar-24 14:54 UTC
Re: [Xen-devel] [PATCH] x86: fix improper return value from relinquish_memory()
On 24/03/2010 14:43, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:>> Btw., the reason I was looking at that code was that we observe >> zombie domains - ones in DOMDYING_dead state, perhaps having >> almost none of their memory freed (shadowed guests appear to be >> particularly bad). In one of the reports, an interesting extra fact >> was that this happened only for the first 100 guests - any >> subsequent ones got destroyed properly (obviously to get there >> this requires quite a bit of memory in the host). Has anyone else >> observed this? Does this ring any bells? > > Yes. > http://lists.xensource.com/archives/html/xen-devel/2008-12/msg00222.html > > B/c of the page count we had guests that would never have their mmap > count removed causing them to be zombie guests. Our fix, which wasn''t > nice, was to have the guest domain id re-number and shove it and its > remaining page ownership (at that point it only has some pages in Dom0 > and DomU) in a corner.There''s a big difference between a zombie domain owning a few pages versus a zombie domain still having most of its memory, though. One could be a ref-count leak, the other sounds potentially like something wrong with the domain-killing routines (not that there''s enough data to definitely say either way for sure yet -- but at least they do sound like different bugs). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel