> On Thu, 2013-08-08 at 9:29 +0100, Ian wrote: >> On Wed, 2013-08-07 at 10:35 -0700, waitxie wrote: >>> Hi, Jui-Hao, >> >> Did you intend to CC this person? >> >>> I want to enable memory sharing in xen. I am a newbie in Xen. I searched in >>> internet. There is few information about memory sharing. Your google >>> document can not be access any more. >>> >>> https://docs.google.com/document/pub?id=16cK1JtXhhyNkGHtV_0XTkFnM2nkVxSP144EjozioC6w#h.y4a96-augwr7 >>> >>> Can you give me some instruction about enabling memory sharing in xen? >>> Thanks. > > Maybe, but since it was in reply to a 2.5 year old email, I''m not sure > anything would be forthcoming. However, I am also interested in > memory sharing, mainly among Linux domains running off a common COW > core image, so allow me to bring in a few specific questions: > > 1) Is memory sharing currently working? I assume it is, given today''s > emails from Tim Deegan and Nai Xia, but it seems there may have been > times it was not working.Correct and correct, but since 4.2 we have a working subsystem.> > 2) It sounds like for domains running off a common base image, and in > particular Linux, the blkback2-initiated memory sharing is very > effective, and introduces little overhead. Is the blkback2 memory > sharing currently working (it more definitely appears that this was > disabled at some point)? How is it invoked (for example, configure > options and/or command line parameters)?That part hasn''t been tested for a while. What certainly works is the hypervisor substrate. You need to rebuild & relink blktap* to use tools/memshr, and then figure out whether that works and how to leverage it. tools/memshr is one of potentially many policies you can write in user-space to identify what to share.> > 3) Various literature points out zero-page sharing is effective for > Windows, mainly due to scrubbing of pages when they are freed (page > scrubbing by Xen on domain startup also seems to help). Is there any > code available for identifying zero pages and getting the needed > hypercalls invoked? This might be a naive page scanner running on > dom0, or more a more OS-aware scanner capable of crawling through the > kernel to pull out free page info. Has anyone modified the Linux > kernel to zero out freed pages to better exploit zero-page sharing?YMMV there. You are sharing pages that will be unshared with high certainty. There is no code doing what you propose, but it''s not terribly difficult to write one. The "naive page scanner" as you put it, would scan, find a zero page, nominate for sharing, scan again to prevent TOCTTOU race between end of scan and nomination, and then attempt to share. Note that the sharing hyper call will automatically fail in the case of TOCTTOU between nominate and share actions. No one has modified Linux to do that to the best of my knowledge.> > 4) Memory ballooning seems fairly new for HVMs. Does it conflict with > memory sharing? For example, it might only be safe to do blkback2 > sharing, but better to allow only ballooning to manage free pages to > avoid confusion over page ownership. Also, when a page is freed due > to sharing, is it then available for ballooning? I suppose if all of > an HVM''s free pages are being quickly scrubbed and shared, ballooning > could be superfluous for that domain.Depends on your definition of fairly new :) We took great care to ensure ballooning and sharing coexist peacefully. I think it''s useless to balloon shared pages because you are not winning a single byte back into the system. But it can be done. "when a page is freed due to sharing" it goes back to the host-wide pool. It is *not* available for ballooning. The guest has no idea it is a shared page. The host is free to use the freed page for whatever purposes.> > 5) Is there any code available that runs inside a domain for > identifying zero pages (and possibly identifying and hashing pages > that might be less dynamic) and passing that information to dom0? For > example, a kernel module or other code with appropriate kernel hooks.Nope that I know of.> > 6) Does OS caching need to be disabled, or more likely dialed down to > a small maximum, to ensure free memory gets shared and relinquished?OS guest side caching is irrelevant. OS dom0 caching is disabled in most cases when using blkback or blktap. I am not sure what you are talking about here though: you are linking caching to freeing memory to sharing. Caching is a good thing and that''s precisely what tools/memshr, the page cache on the OS guest side, aims to share.> > 7) What is the behavior when a page needs to be unshared on an > overcommitted system, and no pages are available? For example, does > the unlucky domain get hung or destroyed? Swapping has been > mentioned, with accompanying issues of identifying appropriate pages > to be swapped. Is there any code implementing swapping?You need to come up with an appropriate fallback policy. The hypervisor will use the men event subsystem to alert you that the page cannot be unshared due to ENOMEM, and will pause the vcpu until you kick it back allowing it to try again. Look for the shared memory ring in <xen>/xen/arch/x86/mm/mem_sharing and <xen>/tools/libxc/xc_mem_sharing.> > 8) My understanding is that only HVM-, and not PV-based, domUs can > share pages. However, even if such a feature is not presently > implemented, could dom0 share pages with an HVM domain?You cannot share pages in PV because PV controls the physical to machine mapping of pages. Sharing needs to change this arbitrarily, asynchronously, and at any point in time, to be able to share and unshare. It will never work for pure PV.> > 9) Aside from the occasional academic projects, is memory sharing > being actively used anywhere?Gridcentric Inc for sure, all the time. Andres> > 10) Any caveats besides the general risk of an overcommited set of > domains requiring more memory than is available? > > > Sorry; that was a few more questions than I started off with. They > are in order of importance, if that helps... > > Thank you, > Eric
Thank you for taking the time; it answered a number of lingering questions. If anyone is successfully using blkback2 with memshr, I would appreciate hearing about it.> > 6) Does OS caching need to be disabled, or more likely dialed down to > > a small maximum, to ensure free memory gets shared and relinquished? > > OS guest side caching is irrelevant. OS dom0 caching is disabled in most cases > when using blkback or blktap. I am not sure what you are talking about here > though: you are linking caching to freeing memory to sharing. Caching is a good > thing and that''s precisely what tools/memshr, the page cache on the OS guest side, > aims to share.Maybe an example will avoid abusing some of those terms: Given a standard Linux system with swap disabled, a set of processes will have certain minimum RAM requirement for instructions and data (I have seen the term "working set" applied) - for example, more than 1GB of RAM will be needed to run a large full-chip simulation. Generally there is more than that minimum amount of RAM. Over time, except with a small working set of programs and data, all or most the excess RAM tends to be put to use in the page cache (rather than being entirely unused), but is readily available to be directly assigned to processes. I assume pages in the page cache would contain useful, non-zero data, such as recent block I/O data. This appears to exclude zero-page sharing, and may not be all that successful for same-page sharing. In general, I probably would find it more useful if (most of) the excess pages were available for other domains via sharing. For example, maybe it would be better for the excess pages to be part of the page cache for a storage domain common to the other domains, allowing it to make more holistic caching decisions and hopefully already have the more active blocks in its cache - perhaps affording some TMEM-like benefits to non-TMEM-capable OSes (which is pretty much anything other than Linux?). The question was mainly: if I lazily/conservatively overallocate excess memory to domains, and hope page sharing will automagically minimize their footprint, will the use and dirtying of excess pages by the page cache cripple their sharing? If so, I am curious if would make sense to cap the page cache, if possible, to say 100MB. I suspect total disabling of the page cache is impossible or destroys performance.
On Aug 9, 2013, at 12:34 PM, Eric Shelton <eshelton@pobox.com> wrote:> Thank you for taking the time; it answered a number of lingering questions. > > If anyone is successfully using blkback2 with memshr, I would > appreciate hearing about it. > >>> 6) Does OS caching need to be disabled, or more likely dialed down to >>> a small maximum, to ensure free memory gets shared and relinquished? >> >> OS guest side caching is irrelevant. OS dom0 caching is disabled in most cases >> when using blkback or blktap. I am not sure what you are talking about here >> though: you are linking caching to freeing memory to sharing. Caching is a good >> thing and that''s precisely what tools/memshr, the page cache on the OS guest side, >> aims to share. > > Maybe an example will avoid abusing some of those terms: > Given a standard Linux system with swap disabled, a set of processes > will have certain minimum RAM requirement for instructions and data (I > have seen the term "working set" applied) - for example, more than 1GB > of RAM will be needed to run a large full-chip simulation. > > Generally there is more than that minimum amount of RAM. Over time, > except with a small working set of programs and data, all or most the > excess RAM tends to be put to use in the page cache (rather than being > entirely unused), but is readily available to be directly assigned to > processes. I assume pages in the page cache would contain useful, > non-zero data, such as recent block I/O data. This appears to exclude > zero-page sharing, and may not be all that successful for same-page > sharing. > > In general, I probably would find it more useful if (most of) the > excess pages were available for other domains via sharing. For > example, maybe it would be better for the excess pages to be part of > the page cache for a storage domain common to the other domains, > allowing it to make more holistic caching decisions and hopefully > already have the more active blocks in its cache - perhaps affording > some TMEM-like benefits to non-TMEM-capable OSes (which is pretty much > anything other than Linux?).That whole description really seems like TMEM.> > The question was mainly: if I lazily/conservatively overallocate > excess memory to domains, and hope page sharing will automagically > minimize their footprint, will the use and dirtying of excess pages by > the page cache cripple their sharing? If so, I am curious if would > make sense to cap the page cache, if possible, to say 100MB. I > suspect total disabling of the page cache is impossible or destroys > performance.You just can''t do that in Linux. Psge cache is so intimately baked in, there is no notion of "turning it off" Andres
On Mon, Aug 12, 2013 at 10:58 AM, Andres Lagar-Cavilla <andreslc@gridcentric.ca> wrote:> > > ... The "naive page scanner" as you put it, would scan, find a zero page, > > > nominate for sharing, scan again to prevent TOCTTOU race between end > > > of scan and nomination, and then attempt to share. Note that the sharing > > > hyper call will automatically fail in the case of TOCTTOU between nominate > > > and share actions.Given that the hypervisor will reject the TOCTTOU case, doesn''t that eliminate any race hazard? I saw the second scan mentioned in a paper as well, but if it only shrinks, but does not close, the TOCTTOU window, why bother? Are you trying to avoid the expense of a hypercall? If that is an issue, maybe it is worth having versions of xc_memshr_nominate_gfn() and xc_memshr_share_gfns() that handle more than one pair pair pages at a time.> > In general, I probably would find it more useful if (most of) the > > excess pages were available for other domains via sharing. For > > example, maybe it would be better for the excess pages to be part of > > the page cache for a storage domain common to the other domains, > > allowing it to make more holistic caching decisions and hopefully > > already have the more active blocks in its cache - perhaps affording > > some TMEM-like benefits to non-TMEM-capable OSes (which is pretty much > > anything other than Linux?). > That whole description really seems like TMEM.As best as I understand it, this would provide only very limited aspects of TMEM, and perhaps a substitute for ballooning. Nevertheless, maybe there is a win by giving these more limited, but still significant, benefits to non-tmem and/or non-ballooning aware domains. TMEM-like aspect: The common caching storage domain is perhaps something like cleancache. However, this cache may not be readily resizeable to respond to host-wide memory pressures. Balloon-like aspect: If guest operating systems can be persuaded to (1) limit the size of their page cache, and (2) zero out the remaining free pages, zero-page merging might be an alternative to ballooning, and could even avoid issues arising from the ballooning driver not meeting kernel memory demands quickly enough (eg, allocate 8GB of memory, of which 6GB is typically free, zeroed, and shared). However, Linux does not seem to be tunable to provide either (1) or (2), and would have to, and I think can, be patched to do so. Windows provides SetSystemFileCacheSize(), which appears to often be used to reduce VM memory footprint and provides (1), and my understanding is that (2) is default Windows behavior.>> The question was mainly: if I lazily/conservatively overallocate >> excess memory to domains, and hope page sharing will automagically >> minimize their footprint, will the use and dirtying of excess pages by >> the page cache cripple their sharing? If so, I am curious if would >> make sense to cap the page cache, if possible, to say 100MB. I >> suspect total disabling of the page cache is impossible or destroys >> performance. > You just can''t do that in Linux. Psge cache is so intimately baked in, there is no notion of "turning it off"It is not a surprise that it cannot be turned off. Even if it were, I imagine trying to run with uncached block I/O, even if a storage domain held the data in its cache, would drop performance by several orders of magnitude. However, setting a maximum number of pages for the page cache seems doable. There is a clearly identifiable value (or sum of the value across zones) in the kernel that I would like to put a ceiling on, nr_file_pages, which can be seen in /proc/zoneinfo and directly corresponds to the "buffers" and "cached" values indicted in free (specifically, nr_file_pages * 4 = buffers + cached). Perhaps all that is needed is an additional sysctl value, max_nr_file_pages, which would establish a ceiling for nr_file_pages, and to get the page cache to respect it. As the benefit of this is then realized by zeroing out the remaining free pages for merging, zeroing would also ideally occur right after pages were freed. Exactly how this happens seems to be a more delicate operation, since zeroing a page probably takes a nontrivial amount of time as far as VMM is concerned. Does Linux follow an MRU policy for reallocating free pages? It would be helpful not to waste time zeroing the next n pages up for allocation. - Eric