thr3ads.net - Xen devel - Re: Memory Sharing on HVM guests [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Andres Lagar-Cavilla

2013-Aug-09 07:34 UTC

Re: Memory Sharing on HVM guests

> On Thu, 2013-08-08 at 9:29 +0100, Ian wrote:
>> On Wed, 2013-08-07 at 10:35 -0700, waitxie wrote:
>>> Hi, Jui-Hao,
>> 
>> Did you intend to CC this person?
>> 
>>> I want to enable memory sharing in xen. I am a newbie in Xen. I
searched in
>>> internet. There is few information about memory sharing. Your
google
>>> document can not be access any more.
>>> 
>>>
https://docs.google.com/document/pub?id=16cK1JtXhhyNkGHtV_0XTkFnM2nkVxSP144EjozioC6w#h.y4a96-augwr7
>>> 
>>> Can you give me some instruction about enabling memory sharing in
xen?
>>> Thanks.
> 
> Maybe, but since it was in reply to a 2.5 year old email, I''m not
sure
> anything would be forthcoming.  However, I am also interested in
> memory sharing, mainly among Linux domains running off a common COW
> core image, so allow me to bring in a few specific questions:
> 
> 1) Is memory sharing currently working?  I assume it is, given
today''s
> emails from Tim Deegan and Nai Xia, but it seems there may have been
> times it was not working.Correct and correct, but since 4.2 we have a working
subsystem.> 
> 2) It sounds like for domains running off a common base image, and in
> particular Linux, the blkback2-initiated memory sharing is very
> effective, and introduces little overhead.  Is the blkback2 memory
> sharing currently working (it more definitely appears that this was
> disabled at some point)?  How is it invoked (for example, configure
> options and/or command line parameters)?That part hasn''t been tested for a while. What certainly works is the
hypervisor substrate. You need to rebuild & relink blktap* to use
tools/memshr, and then figure out whether that works and how to leverage it.

tools/memshr is one of potentially many policies you can write in user-space to
identify what to share.> 
> 3) Various literature points out zero-page sharing is effective for
> Windows, mainly due to scrubbing of pages when they are freed (page
> scrubbing by Xen on domain startup also seems to help).  Is there any
> code available for identifying zero pages and getting the needed
> hypercalls invoked?  This might be a naive page scanner running on
> dom0, or more a more OS-aware scanner capable of crawling through the
> kernel to pull out free page info.  Has anyone modified the Linux
> kernel to zero out freed pages to better exploit zero-page sharing?
YMMV there. You are sharing pages that will be unshared with high certainty.
There is no code doing what you propose, but it''s not terribly
difficult to write one. The "naive page scanner" as you put it, would
scan, find a zero page, nominate for sharing, scan again to prevent TOCTTOU race
between end of scan and nomination, and then attempt to share. Note that the
sharing hyper call will automatically fail in the case of TOCTTOU between
nominate and share actions.

No one has modified Linux to do that to the best of my knowledge.
> 
> 4) Memory ballooning seems fairly new for HVMs.  Does it conflict with
> memory sharing?  For example, it might only be safe to do blkback2
> sharing, but better to allow only ballooning to manage free pages to
> avoid confusion over page ownership.  Also, when a page is freed due
> to sharing, is it then available for ballooning?  I suppose if all of
> an HVM''s free pages are being quickly scrubbed and shared,
ballooning
> could be superfluous for that domain.
Depends on your definition of fairly new :)

We took great care to ensure ballooning and sharing coexist peacefully. I think
it''s useless to balloon shared pages because you are not winning a
single byte back into the system. But it can be done.

"when a page is freed due to sharing" it goes back to the host-wide
pool. It is *not* available for ballooning. The guest has no idea it is a shared
page. The host is free to use the freed page for whatever
purposes.> 
> 5) Is there any code available that runs inside a domain for
> identifying zero pages (and possibly identifying and hashing pages
> that might be less dynamic) and passing that information to dom0?  For
> example, a kernel module or other code with appropriate kernel hooks.
Nope that I know of.> 
> 6) Does OS caching need to be disabled, or more likely dialed down to
> a small maximum, to ensure free memory gets shared and relinquished?
OS guest side caching is irrelevant. OS dom0 caching is disabled in most cases
when using blkback or blktap. I am not sure what you are talking about here
though: you are linking caching to freeing memory to sharing. Caching is a good
thing and that''s precisely what tools/memshr, the page cache on the OS
guest side, aims to share.
> 
> 7) What is the behavior when a page needs to be unshared on an
> overcommitted system, and no pages are available?  For example, does
> the unlucky domain get hung or destroyed?  Swapping has been
> mentioned, with accompanying issues of identifying appropriate pages
> to be swapped.  Is there any code implementing swapping?
You need to come up with an appropriate fallback policy. The hypervisor will use
the men event subsystem to alert you that the page cannot be unshared due to
ENOMEM, and will pause the vcpu until you kick it back allowing it to try again.
Look for the shared memory ring in <xen>/xen/arch/x86/mm/mem_sharing and
<xen>/tools/libxc/xc_mem_sharing.
> 
> 8) My understanding is that only HVM-, and not PV-based, domUs can
> share pages.  However, even if such a feature is not presently
> implemented, could dom0 share pages with an HVM domain?
You cannot share pages in PV because PV controls the physical to machine mapping
of pages. Sharing needs to change this arbitrarily, asynchronously, and at any
point in time, to be able to share and unshare. It will never work for pure
PV.> 
> 9) Aside from the occasional academic projects, is memory sharing
> being actively used anywhere?
Gridcentric Inc for sure, all the time.

Andres> 
> 10) Any caveats besides the general risk of an overcommited set of
> domains requiring more memory than is available?
> 
> 
> Sorry; that was a few more questions than I started off with.  They
> are in order of importance, if that helps...
> 
> Thank you,
> Eric

Eric Shelton

2013-Aug-09 16:34 UTC

head link

Re: Memory Sharing on HVM guests

Thank you for taking the time; it answered a number of lingering questions.

If anyone is successfully using blkback2 with memshr, I would
appreciate hearing about it.
> > 6) Does OS caching need to be disabled, or more likely dialed down to
> > a small maximum, to ensure free memory gets shared and relinquished?
>
> OS guest side caching is irrelevant. OS dom0 caching is disabled in most
cases
> when using blkback or blktap. I am not sure what you are talking about here
> though: you are linking caching to freeing memory to sharing. Caching is a
good
> thing and that''s precisely what tools/memshr, the page cache on
the OS guest side,
> aims to share.
Maybe an example will avoid abusing some of those terms:
Given a standard Linux system with swap disabled, a set of processes
will have certain minimum RAM requirement for instructions and data (I
have seen the term "working set" applied) - for example, more than 1GB
of RAM will be needed to run a large full-chip simulation.

Generally there is more than that minimum amount of RAM.  Over time,
except with a small working set of programs and data, all or most the
excess RAM tends to be put to use in the page cache (rather than being
entirely unused), but is readily available to be directly assigned to
processes.  I assume pages in the page cache would contain useful,
non-zero data, such as recent block I/O data.  This appears to exclude
zero-page sharing, and may not be all that successful for same-page
sharing.

In general, I probably would find it more useful if (most of) the
excess pages were available for other domains via sharing.  For
example, maybe it would be better for the excess pages to be part of
the page cache for a storage domain common to the other domains,
allowing it to make more holistic caching decisions and hopefully
already have the more active blocks in its cache - perhaps affording
some TMEM-like benefits to non-TMEM-capable OSes (which is pretty much
anything other than Linux?).

The question was mainly: if I lazily/conservatively overallocate
excess memory to domains, and hope page sharing will automagically
minimize their footprint, will the use and dirtying of excess pages by
the page cache cripple their sharing?  If so, I am curious if would
make sense to cap the page cache, if possible, to say 100MB.  I
suspect total disabling of the page cache is impossible or destroys
performance.

Andres Lagar-Cavilla

2013-Aug-12 14:58 UTC

head link

Re: Memory Sharing on HVM guests

On Aug 9, 2013, at 12:34 PM, Eric Shelton <eshelton@pobox.com> wrote:
> Thank you for taking the time; it answered a number of lingering questions.
> 
> If anyone is successfully using blkback2 with memshr, I would
> appreciate hearing about it.
> 
>>> 6) Does OS caching need to be disabled, or more likely dialed down
to
>>> a small maximum, to ensure free memory gets shared and
relinquished?
>> 
>> OS guest side caching is irrelevant. OS dom0 caching is disabled in
most cases
>> when using blkback or blktap. I am not sure what you are talking about
here
>> though: you are linking caching to freeing memory to sharing. Caching
is a good
>> thing and that''s precisely what tools/memshr, the page cache
on the OS guest side,
>> aims to share.
> 
> Maybe an example will avoid abusing some of those terms:
> Given a standard Linux system with swap disabled, a set of processes
> will have certain minimum RAM requirement for instructions and data (I
> have seen the term "working set" applied) - for example, more
than 1GB
> of RAM will be needed to run a large full-chip simulation.
> 
> Generally there is more than that minimum amount of RAM.  Over time,
> except with a small working set of programs and data, all or most the
> excess RAM tends to be put to use in the page cache (rather than being
> entirely unused), but is readily available to be directly assigned to
> processes.  I assume pages in the page cache would contain useful,
> non-zero data, such as recent block I/O data.  This appears to exclude
> zero-page sharing, and may not be all that successful for same-page
> sharing.
> 
> In general, I probably would find it more useful if (most of) the
> excess pages were available for other domains via sharing.  For
> example, maybe it would be better for the excess pages to be part of
> the page cache for a storage domain common to the other domains,
> allowing it to make more holistic caching decisions and hopefully
> already have the more active blocks in its cache - perhaps affording
> some TMEM-like benefits to non-TMEM-capable OSes (which is pretty much
> anything other than Linux?).
That whole description really seems like TMEM.> 
> The question was mainly: if I lazily/conservatively overallocate
> excess memory to domains, and hope page sharing will automagically
> minimize their footprint, will the use and dirtying of excess pages by
> the page cache cripple their sharing?  If so, I am curious if would
> make sense to cap the page cache, if possible, to say 100MB.  I
> suspect total disabling of the page cache is impossible or destroys
> performance.You just can''t do that in Linux. Psge cache is so intimately baked in,
there is no notion of "turning it off"

Andres

Eric Shelton

2013-Aug-14 18:06 UTC

head link

Re: Memory Sharing on HVM guests

On Mon, Aug 12, 2013 at 10:58 AM, Andres Lagar-Cavilla
<andreslc@gridcentric.ca> wrote:> > > ... The "naive page scanner" as you put it, would scan,
find a zero page,
> > > nominate for sharing, scan again to prevent TOCTTOU race between
end
> > > of scan and nomination, and then attempt to share. Note that the
sharing
> > > hyper call will automatically fail in the case of TOCTTOU between
nominate
> > > and share actions.
Given that the hypervisor will reject the TOCTTOU case, doesn''t that
eliminate any race hazard?  I saw the second scan mentioned in a paper
as well, but if it only shrinks, but does not close, the TOCTTOU
window, why bother?  Are you trying to avoid the expense of a
hypercall?  If that is an issue, maybe it is worth having versions of
xc_memshr_nominate_gfn() and xc_memshr_share_gfns() that handle more
than one pair pair pages at a time.
> > In general, I probably would find it more useful if (most of) the
> > excess pages were available for other domains via sharing.  For
> > example, maybe it would be better for the excess pages to be part of
> > the page cache for a storage domain common to the other domains,
> > allowing it to make more holistic caching decisions and hopefully
> > already have the more active blocks in its cache - perhaps affording
> > some TMEM-like benefits to non-TMEM-capable OSes (which is pretty much
> > anything other than Linux?).
> That whole description really seems like TMEM.
As best as I understand it, this would provide only very limited
aspects of TMEM, and perhaps a substitute for ballooning.
Nevertheless, maybe there is a win by giving these more limited, but
still significant, benefits to non-tmem and/or non-ballooning aware
domains.

TMEM-like aspect: The common caching storage domain is perhaps
something like cleancache.  However, this cache may not be readily
resizeable to respond to host-wide memory pressures.

Balloon-like aspect: If guest operating systems can be persuaded to
(1) limit the size of their page cache, and (2) zero out the remaining
free pages, zero-page merging might be an alternative to ballooning,
and could even avoid issues arising from the ballooning driver not
meeting kernel memory demands quickly enough (eg, allocate 8GB of
memory, of which 6GB is typically free, zeroed, and shared).  However,
Linux does not seem to be tunable to provide either (1) or (2), and
would have to, and I think can, be patched to do so.  Windows provides
SetSystemFileCacheSize(), which appears to often be used to reduce VM
memory footprint and provides (1), and my understanding is that (2) is
default Windows behavior.
>> The question was mainly: if I lazily/conservatively overallocate
>> excess memory to domains, and hope page sharing will automagically
>> minimize their footprint, will the use and dirtying of excess pages by
>> the page cache cripple their sharing?  If so, I am curious if would
>> make sense to cap the page cache, if possible, to say 100MB.  I
>> suspect total disabling of the page cache is impossible or destroys
>> performance.
> You just can''t do that in Linux. Psge cache is so intimately baked
in, there is no notion of "turning it off"
It is not a surprise that it cannot be turned off.  Even if it were, I
imagine trying to run with uncached block I/O, even if a storage
domain held the data in its cache, would drop performance by several
orders of magnitude.

However, setting a maximum number of pages for the page cache seems
doable.  There is a clearly identifiable value (or sum of the value
across zones) in the kernel that I would like to put a ceiling on,
nr_file_pages, which can be seen in /proc/zoneinfo and directly
corresponds to the "buffers" and "cached" values indicted in
free
(specifically, nr_file_pages * 4 = buffers + cached).  Perhaps all
that is needed is an additional sysctl value, max_nr_file_pages, which
would establish a ceiling for nr_file_pages, and to get the page cache
to respect it.  As the benefit of this is then realized by zeroing out
the remaining free pages for merging, zeroing would also ideally occur
right after pages were freed.  Exactly how this happens seems to be a
more delicate operation, since zeroing a page probably takes a
nontrivial amount of time as far as VMM is concerned.  Does Linux
follow an MRU policy for reallocating free pages?  It would be helpful
not to waste time zeroing the next n pages up for allocation.

- Eric

Xen devel - Aug 2013 - Re: Memory Sharing on HVM guests

Re: Memory Sharing on HVM guests

Re: Memory Sharing on HVM guests

Re: Memory Sharing on HVM guests

Re: Memory Sharing on HVM guests