thr3ads.net - Xen devel - [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying [Nov 2009]

If this information is useful, please help other people find it:
Share via:

George Dunlap

2009-Nov-11 15:08 UTC

[Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

Handle PoD operations properly when a domain is dying.

No populate-on-demand activities should happen when a domain is dying.
 Especially, it is a bug for memory to be added to the PoD cache when
d->is_dying is non-zero, since if this happens after the cache has
been emptied, these pages will never be freed. This may cause "zombie
domains" to linger.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Nov-11 15:26 UTC

head link

RE: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

Hi George --

A possibly related issue came up in a team discussion
yesterday.  It is not uncommon for management tools to
check available memory to decide if there is sufficient
space to provision a new domain, for example when choosing
a machine on which to start a new domain.  But there is
no "lock" so if the answer is yes and the machine
is chosen, but moments later some memory becomes used
(by PoD or ballooning or ???), domain creation might
fail due to insufficient memory.

Tmem has a "freeze" feature to avoid this problem, but
tmem has the advantage that all domains will continue
to function even if tmem is frozen... I''m not sure
that''s true for PoD.

Clearly management tools can be very conservative
and assume every domain on a machine is using its
maxmem, but that defeats the purpose of much of the
overcommitment work we are doing.  I know Citrix
is working on hypervisor swap, but that has a pretty
horrible performance penalty, as well as some
interesting functionality limitations/challenges.

Any clever ideas on how to deal with this problem
in the future?  For example, assume someone wants
to launch a sequence of PoD domains that will be
idle when launched but eventually will utilize all
(or even most) of maxmem.

Thanks,
Dan

> -----Original Message-----
> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com]
> Sent: Wednesday, November 11, 2009 8:08 AM
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] [PATCH] PoD: Handle operations properly 
> when domain
> is dying
> 
> 
> Handle PoD operations properly when a domain is dying.
> 
> No populate-on-demand activities should happen when a domain is dying.
>  Especially, it is a bug for memory to be added to the PoD cache when
> d->is_dying is non-zero, since if this happens after the cache has
> been emptied, these pages will never be freed. This may cause "zombie
> domains" to linger.
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Nov-11 15:42 UTC

head link

Re: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

Dan,

PoD memory is pre-allocated to a VM.  A PoD fault will never cause a 
domain to allocate any more memory than has already been allocated to 
it.  That''s why it''s called "populate on demand"
instead of "allocate on
demand".

(The failure mode in this case was this:
 * domain_kill() called
 * pod_empty_cache() called
 * grant table op calls gfn_to_mfn on a PoD page
 * pod_demand_populate finds the cache empty, and runs an emergency scan
 * emergency scan pulls some unfreed pages from the p2m table to put in 
the pod cache
 * p2m pages freed, p2m table torn down
 * memory inthe pod cache keeps domain from dying.
So no new memory was being allocated; it was just being transferred 
from  a place which hadn''t been destroyed yet to a place which had.)

WRT ballooning: the tools should be telling the balloon driver what to 
do; a well-behaved balloon driver will not allocate more memory unless 
the tools tell it to do so.  There are mechanisms in Xen available to 
the tools to limit the amount of memory a balloon driver can allocate 
even if it''s not behaving.

So as long as the tools don''t tell a VM to change its memory between 
deciding there''s enough memory for the new domain and creating the new 
domain, you shouldn''t have any problems.

-George

Dan Magenheimer wrote:> Hi George --
>
> A possibly related issue came up in a team discussion
> yesterday.  It is not uncommon for management tools to
> check available memory to decide if there is sufficient
> space to provision a new domain, for example when choosing
> a machine on which to start a new domain.  But there is
> no "lock" so if the answer is yes and the machine
> is chosen, but moments later some memory becomes used
> (by PoD or ballooning or ???), domain creation might
> fail due to insufficient memory.
>
> Tmem has a "freeze" feature to avoid this problem, but
> tmem has the advantage that all domains will continue
> to function even if tmem is frozen... I''m not sure
> that''s true for PoD.
>
> Clearly management tools can be very conservative
> and assume every domain on a machine is using its
> maxmem, but that defeats the purpose of much of the
> overcommitment work we are doing.  I know Citrix
> is working on hypervisor swap, but that has a pretty
> horrible performance penalty, as well as some
> interesting functionality limitations/challenges.
>
> Any clever ideas on how to deal with this problem
> in the future?  For example, assume someone wants
> to launch a sequence of PoD domains that will be
> idle when launched but eventually will utilize all
> (or even most) of maxmem.
>
> Thanks,
> Dan
>
>
>   
>> -----Original Message-----
>> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com]
>> Sent: Wednesday, November 11, 2009 8:08 AM
>> To: xen-devel@lists.xensource.com
>> Subject: [Xen-devel] [PATCH] PoD: Handle operations properly 
>> when domain
>> is dying
>>
>>
>> Handle PoD operations properly when a domain is dying.
>>
>> No populate-on-demand activities should happen when a domain is dying.
>>  Especially, it is a bug for memory to be added to the PoD cache when
>> d->is_dying is non-zero, since if this happens after the cache has
>> been emptied, these pages will never be freed. This may cause
"zombie
>> domains" to linger.
>>
>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>>
>>     

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Nov-11 16:53 UTC

head link

RE: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

> (The failure mode in this case was this:
> :
> So no new memory was being allocated; it was just being transferred 
> from  a place which hadn''t been destroyed yet to a place which
had.)
Sorry, I should have noted in my original reply that this
was topic drift not necessarily related to your patch.
The patch just reminded me to raise this PoD-related issue.
> WRT ballooning: the tools should be telling the balloon 
> driver what to 
> do; a well-behaved balloon driver will not allocate more 
> memory unless 
> the tools tell it to do so.  There are mechanisms in Xen available to 
> the tools to limit the amount of memory a balloon driver can allocate 
> even if it''s not behaving.
Yes ideally.  I believe, in reality, guest memory utilization
varies too dynamically for the tools to be involved in every
ballooning decision (which is why tmem is useful).

Even assuming your tools do completely control ballooning
though...
> So as long as the tools don''t tell a VM to change its memory
between
> deciding there''s enough memory for the new domain and 
> creating the new 
> domain, you shouldn''t have any problems.
BUT, PoD is essentially doing dynamic ballooning without
notifying the tools, correct?  Unless I misunderstand, the
whole point of PoD is to not use zeroed-out-by-Windows
memory until it gets written (with non-zeroes), and the
underlying objective is that that not-yet-used memory can
be used for other purposes -- such as other domains.
In the case of PoD, the memory usage of the PoD domain
is monotonically increasing (ignoring ballooning for now)
but it is essentially growing in size dynamically without
notifying the tools.  As a result, either the tools need
to assume that a PoD domain IS using all of its memory,
or it risks making a decision based on rapidly-changing
(and thus ultimately stale/incorrect) data.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Nov-11 17:13 UTC

head link

Re: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

On 11/11/2009 16:53, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>> So as long as the tools don''t tell a VM to change its memory
between
>> deciding there''s enough memory for the new domain and
>> creating the new
>> domain, you shouldn''t have any problems.
> 
> BUT, PoD is essentially doing dynamic ballooning without
> notifying the tools, correct?  Unless I misunderstand, the
> whole point of PoD is to not use zeroed-out-by-Windows
> memory until it gets written (with non-zeroes), and the
> underlying objective is that that not-yet-used memory can
> be used for other purposes -- such as other domains.
The main point is to be able to start a domain with less memory than it
thinks it has, so it can be ballooned up (under tools control) later.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Nov-11 17:15 UTC

head link

Re: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

Dan Magenheimer wrote:> BUT, PoD is essentially doing dynamic ballooning without
> notifying the tools, correct?  Unless I misunderstand, the
> whole point of PoD is to not use zeroed-out-by-Windows
> memory until it gets written (with non-zeroes), and the
> underlying objective is that that not-yet-used memory can
> be used for other purposes -- such as other domains.
>   You misunderstand. :-)

The *only* point of PoD is to allow a VM to boot "pre-ballooned".  It
is
not related to memory overcommit.  PoD memory is allocated to a domain 
by the domain builder, and is only ever changed afterwards by explicit 
calls made by the toolstack.  Memory is moved from the PoD "cache"* to
the p2m table and back again automatically, but the total amount of 
memory owned by the domain is constant.

To review, the problem that PoD solves is the following:

HVM guests (both Linux and Windows) read the e820 map early in boot, and 
consider their memory size essentially fixed based on what they read.  
IOW, if Windows reads 1GiB in the e820 map, it will never use more than 
1GiB of RAM.

In a virtualized environment, we''d like to have the flexibility of 
booting a VM with 1GiB of RAM, but then increasing its RAM (say, up to 
4GiB) after boot if it is determined that the VM in question needs more 
memory.

Without PoD, your only option is to build the domain with 4GiB of RAM 
and then wait for the balloon driver to balloon the VM down to 1GiB.  
The problem with this, of course, is that you have to scrape together 
the other 4GiB for the course of the boot.

With PoD, you pass the domain builder two values: 4GiB and 1GiB.  The 
domain builder will fill the p2m table with 4GiB of PoD entries, and 
then allocate 1GiB of ram for the per-domain PoD "cache".  Xen will
move
memory into and out of this "cache" as needed to allow the VM to boot 
until the balloon driver loads. But the total amount of memory used by 
the VM during this time is fixed at 1GiB.  If this pre-allocated amount 
of RAM is used up, no more memory is allocated; the domain crashes.

The zero-page scans are used to recover memory from the p2m table and 
put it back in the per-domain PoD "cache".  This memory is not
returned
to Xen, and cannot be used for other VMs.

So if you''re using PoD, you can trust that the memory used by the VM 
will not change "under your feet" so to speak.

 -George

* I put the term "cache" in quotes because it is related to the normal
English definition of the word ("a hidden storage space"), rather than
the computer science meaning of the word (e.g., extra copies of data 
used to speed up storage heirarchies).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Nov-11 17:58 UTC

head link

RE: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

Thanks for the detailed explanation.  I think
"Populate on Demand" connotes too much like
"Copy on Write" and doesn''t really describe
the essence of what is going on: Booting pre-ballooned.
Thus my confusion.
> If this pre-allocated amount of RAM is used up, no
> more memory is allocated; the domain crashes.
This is of course less than ideal as it ensures that
administrators will choose some safe (probably fairly large)
amount of memory, a significant portion of which will often
be wasted.  But I suppose its better than the (much larger)
alternative.
> So if you''re using PoD, you can trust that the memory used by the
VM
> will not change "under your feet" so to speak.
That answers my question then.

The problem still applies to ballooning... at least in Linux
there are ways that a guest sysadmin can manually adjust
the balloon without tools knowledge.  But I guess that''s
a different discussion.

Thanks again,
Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dave Scott

2009-Nov-11 18:12 UTC

head link

RE: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

> A possibly related issue came up in a team discussion
> yesterday.  It is not uncommon for management tools to
> check available memory to decide if there is sufficient
> space to provision a new domain, for example when choosing
> a machine on which to start a new domain.  But there is
> no "lock" so if the answer is yes and the machine
> is chosen, but moments later some memory becomes used
> (by PoD or ballooning or ???), domain creation might
> fail due to insufficient memory.
In the xapi toolstack world we assume that all domains on a host will be able to
balloon down to their "dynamic_min" and use this in our host-choosing
logic. We bias the choice of host towards the ones with the most memory free...
but ultimately if the guests don''t respond then we''ll still
get an out-of-memory error.

We''re also writing/debugging a host ballooning daemon which you can ask
to reserve memory for stuff like a domain creation. It asks the domains to
balloon down and then makes sure they stay down by setting maxmem. I''m
in the middle of writing up how it''s supposed to work, I''ll
send a link around for comments when I have a draft.
> Tmem has a "freeze" feature to avoid this problem, but
> tmem has the advantage that all domains will continue
> to function even if tmem is frozen... I''m not sure
> that''s true for PoD.
This sounds interesting -- do you have a link to a document somewhere?

Cheers,
Dave

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Nov-11 18:27 UTC

head link

RE: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

(Sorry, George, for hijacking your thread :-)
> > A possibly related issue came up in a team discussion
> > yesterday.  It is not uncommon for management tools to
> > check available memory to decide if there is sufficient
> > space to provision a new domain, for example when choosing
> > a machine on which to start a new domain.  But there is
> > no "lock" so if the answer is yes and the machine
> > is chosen, but moments later some memory becomes used
> > (by PoD or ballooning or ???), domain creation might
> > fail due to insufficient memory.
> 
> In the xapi toolstack world we assume that all domains on a 
> host will be able to balloon down to their "dynamic_min" and 
> use this in our host-choosing logic. We bias the choice of 
> host towards the ones with the most memory free... but 
> ultimately if the guests don''t respond then we''ll still
get
> an out-of-memory error.
> 
> We''re also writing/debugging a host ballooning daemon which 
> you can ask to reserve memory for stuff like a domain 
> creation. It asks the domains to balloon down and then makes 
> sure they stay down by setting maxmem. I''m in the middle of 
> writing up how it''s supposed to work, I''ll send a link
around
> for comments when I have a draft.
Great!  I''m definitely interested!
> > Tmem has a "freeze" feature to avoid this problem, but
> > tmem has the advantage that all domains will continue
> > to function even if tmem is frozen... I''m not sure
> > that''s true for PoD.
> 
> This sounds interesting -- do you have a link to a document somewhere?
Tmem is fully implemented in xen-unstable in both Xen
and the xm tools, but it currently needs to be turned
on by a boot option and does nothing without a tmem-modified
guest kernel.

There''s a lot of generic description of tmem at
http://oss.oracle.com/projects/tmem
but as of now the only description of tmem-freeze is
in the code.  Let me know if you have any general
questions and I will try to answer.

Note however that the concept of "free memory" changes
a lot on any system with a guest using tmem.  There is
now "free memory" and "freeable memory" and the sum of
the two is memory that is available for other purposes
such as domain creation.

Thanks,
Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Nov-11 22:49 UTC

head link

Re: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

On Wed, Nov 11, 2009 at 5:58 PM, Dan Magenheimer
<dan.magenheimer@oracle.com> wrote:> Thanks for the detailed explanation.  I think
> "Populate on Demand" connotes too much like
> "Copy on Write" and doesn''t really describe
> the essence of what is going on: Booting pre-ballooned.
> Thus my confusion.
Well, "populate on demand" describes exactly what''s actually
happening: the p2m table is being populated as it''s used, as opposed
to being populated all at once at the beginning.  If it was allocating
memory on demand it would have been called "allocate on demand". :-)
> This is of course less than ideal as it ensures that
> administrators will choose some safe (probably fairly large)
> amount of memory, a significant portion of which will often
> be wasted.  But I suppose its better than the (much larger)
> alternative.
PoD is only meant to keep the domain going until the balloon driver
loads, which should be fairly early in boot.  How much is required
changes depending on the OS and the maximum amount of memory;  For
Windows XP, 32-bit, with 4GiB maximum (i.e., reported in the e820
map), 256 KiB is sufficient to start with (if I recall correctly).
For Windows 7, that''s a lot higher.  I forget the exact number, but
it''s 1-2GiB to boot with a 4GiB maximum.  Citrix is doing extensive
testing to find out minimums for a large number of configurations, so
that administrators using XenServer can confidently set the value to a
minimum knowing that they''re neither wasting memory nor risking a
crash. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Nov-11 23:34 UTC

head link

RE: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com]
> 
> On Wed, Nov 11, 2009 at 5:58 PM, Dan Magenheimer
> <dan.magenheimer@oracle.com> wrote:
> > Thanks for the detailed explanation.  I think
> > "Populate on Demand" connotes too much like
> > "Copy on Write" and doesn''t really describe
> > the essence of what is going on: Booting pre-ballooned.
> > Thus my confusion.
> 
> Well, "populate on demand" describes exactly what''s
actually
> happening: the p2m table is being populated as it''s used, as
opposed
> to being populated all at once at the beginning.  If it was allocating
> memory on demand it would have been called "allocate on demand".
:-)
I understand that from your (developer''s) point of view
but 99% or more of your customers won''t know what a p2m
table is or what populating it means and will assume
(as I did earlier), since this is a memory-related
feature, that "populate" refers to utilizing RAM.

It''s a cool and valuable feature... I just
think it deserves a better name. ;-)
> > This is of course less than ideal as it ensures that
> > administrators will choose some safe (probably fairly large)
> > amount of memory, a significant portion of which will often
> > be wasted.  But I suppose its better than the (much larger)
> > alternative.
> 
> PoD is only meant to keep the domain going until the balloon driver
> loads, which should be fairly early in boot.  How much is required
> changes depending on the OS and the maximum amount of memory;  For
> Windows XP, 32-bit, with 4GiB maximum (i.e., reported in the e820
> map), 256 KiB is sufficient to start with (if I recall correctly).
> For Windows 7, that''s a lot higher.  I forget the exact number,
but
> it''s 1-2GiB to boot with a 4GiB maximum.  Citrix is doing
extensive
> testing to find out minimums for a large number of configurations, so
> that administrators using XenServer can confidently set the value to a
> minimum knowing that they''re neither wasting memory nor risking a
> crash. :-)
I guess that makes sense in the Windows world where "distros"
can be counted on one or two hands.

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Nov-12 10:56 UTC

head link

Re: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

Dan Magenheimer wrote:> I understand that from your (developer''s) point of view
> but 99% or more of your customers won''t know what a p2m
> table is or what populating it means and will assume
> (as I did earlier), since this is a memory-related
> feature, that "populate" refers to utilizing RAM.
>   Since almost all the code referring to PoD is in a place that only a 
devloper should look (in the hypervisor, and a little bit in the hvm 
domain builder), I think the naming (at least internally) is 
appropriate.  One could imagine "shadow memory" being mis-interpreted
in
a similar fashion.   :-)

PoD is only one piece of the puzzle required to allow booting 
pre-ballooned.  If you feel like we need a name "marketing name" for
the
whole feature to put in annoucements and changelogs, feel free to 
suggest one. :-)

In XenServer, the whole thing will come under the umbrella of "Dynamic 
Memory Control".
> I guess that makes sense in the Windows world where "distros"
> can be counted on one or two hands.
>   My guess is that Linux as a whole will have much less variance than 
Windows.  But we''re not testing HVM Linux ATM.
 
-George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2009 - [PATCH] PoD: Handle operations properly when domain is dying

[Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

RE: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

Re: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

RE: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

Re: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

Re: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

RE: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

RE: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

RE: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

Re: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

RE: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying

Re: [Xen-devel] [PATCH] PoD: Handle operations properly when domain is dying