With the Xen memory overcommit (Satori and xenpaging) work getting closer to the real world (and indeed one gating factor for a major Xen release), I wonder if it is time to ask some very pointed questions: 1) How is the capability and implementation similar or different from VMware''s? And specifically I''m asking for hard information relating to: http://lwn.net/Articles/309155/ http://lwn.net/Articles/330589/ I am not a lawyer and my employer forbids me from reading the related patent claims or speculating on any related issues, but I will be strongly recommending a thorough legal review before Oracle ships this code in any form that customers can enable. (I''m hoping for an answer that would render a review moot.) 2) Assuming no legal issues, how is Xen memory overcommit different or better than VMware''s, which is known to have lots of issues in the real world, such that few customers (outside of a handful of domains such as VDI) enable it? Or is this effort largely to remove an item from the VMware sales team''s differentiation list? And a comparison vs Hyper-V and KVM would be interesting also. 3) Is there new evidence that a host-based-policy-driven memory balancer works sufficiently well on one system, or for multiple hosts, or for a data center? It would be nice for all Xen developers/vendors to understand the intended customer (e.g. is it the desktop user running a handful of VMs running known workloads?) Perhaps this would be a better topic for the Xen Hack-a-thon... sadly I won''t be there and, anyway, I don''t know if there will be a quorum present of the Xen developers specifically working on memory overcommit technology, so I thought it should be brought up on-list beforehand. Dan Thanks... for the memory! I really could use more / my throughput''s on the floor The balloon is flat / my swap disk''s fat / I''ve OOM''s in store Overcommitted so much (with apologies to Bob Hope)
On Fri, Feb 24, Dan Magenheimer wrote:> With the Xen memory overcommit (Satori and xenpaging) work getting > closer to the real world (and indeed one gating factor for a major > Xen release), I wonder if it is time to ask some very pointed > questions:A few comments from me below. I just learned about Satori now, its not clear to me how it is related to memory overcommit. To me memory overcommit means swapping, which is what xenpaging does: turn the whole guest gfn range into some sort of virtual memory, transparent to the guest.> 1) How is the capability and implementation similar or different > from VMware''s? And specifically I''m asking for hard information > relating to: > > http://lwn.net/Articles/309155/ > http://lwn.net/Articles/330589/KSM looks more like page sharing which also Xen can do to some degree. I''m not familiar with the sharing code in Xen or Linux Kernel.> 2) Assuming no legal issues, how is Xen memory overcommit different > or better than VMware''s, which is known to have lots of issues > in the real world, such that few customers (outside of a handful > of domains such as VDI) enable it? Or is this effort largely to > remove an item from the VMware sales team''s differentiation list? > And a comparison vs Hyper-V and KVM would be interesting also.I have no idea what VMware provides to fill the memory overcommit checkbox. The Win8 preview I tested recently offers some sort of memory handling for guests. So far I have not looked into that feature. Since KVM guests are ordinary processes AFAIK they are most likely victims of the Linux kernel process swapping (unless KVM mlocks the gfns?). So KVM gets memory overcommit in the way xenpaging does it for free.> 3) Is there new evidence that a host-based-policy-driven memory > balancer works sufficiently well on one system, or for > multiple hosts, or for a data center? It would be nice for > all Xen developers/vendors to understand the intended customer > (e.g. is it the desktop user running a handful of VMs running > known workloads?)xenpaging is the red emergency knob to free some host memory without caring about the actual memory constraints within the paged guests. Olaf
At 12:53 -0800 on 24 Feb (1330088020), Dan Magenheimer wrote:> 1) How is the capability and implementation similar or different > from VMware''s? And specifically I''m asking for hard information > relating to: > > http://lwn.net/Articles/309155/ > http://lwn.net/Articles/330589/ > > I am not a lawyer and my employer forbids me from reading the > related patent claims or speculating on any related issues, but > I will be strongly recommending a thorough legal review before > Oracle ships this code in any form that customers can enable. > (I''m hoping for an answer that would render a review moot.)I am not a lawyer and my employer forbids me from reading the related patent claims or speculating on any related issues. :P> 2) Assuming no legal issues, how is Xen memory overcommit different > or better than VMware''s, which is known to have lots of issues > in the real world, such that few customers (outside of a handful > of domains such as VDI) enable it? Or is this effort largely to > remove an item from the VMware sales team''s differentiation list? > And a comparison vs Hyper-V and KVM would be interesting also.The blktap-based page-sharing tool doesn''t use content hashes to find pages to share; it relies on storage-layer knowledge to detect disk reads that will have identical results. Grzegorz''s PhD dissertation and the paper on Satori discuss why that''s a better idea than trying to find shareable pages by scanning. I agree that using page sharing to try to recover memory for higher VM density is, let''s say, challenging. But in certain specific workloads (e.g. snowflock &c), or if you''re doing something else with the recovered memory (e.g. tmem?) then it makes more sense. I have no direct experience of real-world deployments.> 3) Is there new evidence that a host-based-policy-driven memory > balancer works sufficiently well on one system, or for > multiple hosts, or for a data center?That, I think, is an open research question.> It would be nice for > all Xen developers/vendors to understand the intended customer > (e.g. is it the desktop user running a handful of VMs running > known workloads?)With my hypervisor hat on, we''ve tried to make a sensible interface where all the policy-related decisions that this question would apply to can be made in the tools. (I realise that I''m totally punting on the question).> Perhaps this would be a better topic for the Xen Hack-a-thon... > sadly I won''t be there and, anyway, I don''t know if there will > be a quorum present of the Xen developers specifically working > on memory overcommit technology, so I thought it should be > brought up on-list beforehand.I won''t be at the hackathon either. Cheers, Tim.
> From: Olaf Hering [mailto:olaf@aepfle.de]Hi Olaf -- Thanks for the reply! Since Tim answers my questions later in the thread, one quick comment...> To me memory overcommit means swapping, which is what xenpaging does: > turn the whole guest gfn range into some sort of virtual memory, > transparent to the guest. > > xenpaging is the red emergency knob to free some host memory without > caring about the actual memory constraints within the paged guests.Sure, but the whole point of increasing RAM in one or more guests is to increase performance, and if overcommitting *always* means swapping, why would anyone use it? So xenpaging is fine and useful, but IMHO only in conjunction with some other technology that reduces total physical RAM usage to less than sum(max_mem(all VMs)). Dan
> From: Tim Deegan [mailto:tim@xen.org] > Subject: Re: Pointed questions re Xen memory overcommitThanks much for the reply Tim!> At 12:53 -0800 on 24 Feb (1330088020), Dan Magenheimer wrote: > > 1) How is the capability and implementation similar or different > > from VMware''s? And specifically I''m asking for hard information > > relating to: > > > > http://lwn.net/Articles/309155/ > > http://lwn.net/Articles/330589/ > > > > I am not a lawyer and my employer forbids me from reading the > > related patent claims or speculating on any related issues, but > > I will be strongly recommending a thorough legal review before > > Oracle ships this code in any form that customers can enable. > > (I''m hoping for an answer that would render a review moot.) > > I am not a lawyer and my employer forbids me from reading the > related patent claims or speculating on any related issues. :PHeh. If there is a smiley-face that means "we both roll our eyes at the insanity of lawyers", put it here.> > 2) Assuming no legal issues, how is Xen memory overcommit different > > or better than VMware''s, which is known to have lots of issues > > in the real world, such that few customers (outside of a handful > > of domains such as VDI) enable it? Or is this effort largely to > > remove an item from the VMware sales team''s differentiation list? > > And a comparison vs Hyper-V and KVM would be interesting also. > > The blktap-based page-sharing tool doesn''t use content hashes to find > pages to share; it relies on storage-layer knowledge to detect disk > reads that will have identical results. Grzegorz''s PhD dissertation and > the paper on Satori discuss why that''s a better idea than trying to find > shareable pages by scanning.Excellent.> I agree that using page sharing to try to recover memory for higher VM > density is, let''s say, challenging. But in certain specific workloads > (e.g. snowflock &c), or if you''re doing something else with the > recovered memory (e.g. tmem?) then it makes more sense. > > I have no direct experience of real-world deployments.Me neither, just some comments from a few VMware users. I agreea Satori and or "the other page sharing" may make good sense in certain heavily redundant workloads.> > 3) Is there new evidence that a host-based-policy-driven memory > > balancer works sufficiently well on one system, or for > > multiple hosts, or for a data center? > > That, I think, is an open research question.OK, that''s what I thought.> > It would be nice for > > all Xen developers/vendors to understand the intended customer > > (e.g. is it the desktop user running a handful of VMs running > > known workloads?) > > With my hypervisor hat on, we''ve tried to make a sensible interface > where all the policy-related decisions that this question would apply to > can be made in the tools. (I realise that I''m totally punting on the > question).That''s OK... though "sensible" is difficult to measure without a broader context. IMHO (and I know I''m in a minority), punting to the tools only increases the main problem (semantic gap).
On Mon, Feb 27, 2012 at 11:40 PM, Dan Magenheimer <dan.magenheimer@oracle.com> wrote:>> From: Olaf Hering [mailto:olaf@aepfle.de] > > Hi Olaf -- > > Thanks for the reply! Since Tim answers my questions later in the > thread, one quick comment... > >> To me memory overcommit means swapping, which is what xenpaging does: >> turn the whole guest gfn range into some sort of virtual memory, >> transparent to the guest. >> >> xenpaging is the red emergency knob to free some host memory without >> caring about the actual memory constraints within the paged guests. > > Sure, but the whole point of increasing RAM in one or more guests > is to increase performance, and if overcommitting *always* means > swapping, why would anyone use it? > > So xenpaging is fine and useful, but IMHO only in conjunction > with some other technology that reduces total physical RAM usage > to less than sum(max_mem(all VMs)).I agree -- overcommitting means giving the guests the illusion of more aggregate memory than there is. Paging is one way of doing that; page sharing is another way. The big reason paging is needed is if guests start to "call in" the committments, by writing to previously shared pages. I would think tmem would also come under "memory overcommit". -George
> From: George Dunlap [mailto:dunlapg@umich.edu] > Subject: Re: [Xen-devel] Pointed questions re Xen memory overcommit > > On Mon, Feb 27, 2012 at 11:40 PM, Dan Magenheimer > <dan.magenheimer@oracle.com> wrote: > >> From: Olaf Hering [mailto:olaf@aepfle.de] > > > > Hi Olaf -- > > > > Thanks for the reply! Since Tim answers my questions later in the > > thread, one quick comment... > > > >> To me memory overcommit means swapping, which is what xenpaging does: > >> turn the whole guest gfn range into some sort of virtual memory, > >> transparent to the guest. > >> > >> xenpaging is the red emergency knob to free some host memory without > >> caring about the actual memory constraints within the paged guests. > > > > Sure, but the whole point of increasing RAM in one or more guests > > is to increase performance, and if overcommitting *always* means > > swapping, why would anyone use it? > > > > So xenpaging is fine and useful, but IMHO only in conjunction > > with some other technology that reduces total physical RAM usage > > to less than sum(max_mem(all VMs)). > > I agree -- overcommitting means giving the guests the illusion of more > aggregate memory than there is. Paging is one way of doing that; page > sharing is another way. The big reason paging is needed is if guests > start to "call in" the committments, by writing to previously shared > pages. I would think tmem would also come under "memory overcommit".Yes and no. By default, tmem''s primary role is to grease the transfer of RAM capacity from one VM to another while minimizing the loss of performance that occurs when aggressively selfballooning (or maybe doing "host-policy-driven-ballooning-with-a-semantic-gap"). However, tmem has two optional features "tmem_compress" and "tmem_dedup" which do result in "memory overcommit" and neither has the "call in the commitments" issue that occurs with shared pages, so tmem does not require xenpaging. That said, I can conceive of a RAMster*-like implementation for which the ability to move hypervisor pages to dom0 might be useful/necessary, so some parts of the xenpaging code in the hypervisor might be required. * http://lwn.net/Articles/481681/
On Mon, Feb 27, Dan Magenheimer wrote:> > From: Olaf Hering [mailto:olaf@aepfle.de] > > To me memory overcommit means swapping, which is what xenpaging does: > > turn the whole guest gfn range into some sort of virtual memory, > > transparent to the guest. > > > > xenpaging is the red emergency knob to free some host memory without > > caring about the actual memory constraints within the paged guests. > > Sure, but the whole point of increasing RAM in one or more guests > is to increase performance, and if overcommitting *always* means > swapping, why would anyone use it?The usage patterns depend on the goal. As I wrote, swapping frees host memory so it can be used for something else, like starting yet another guest on the host. Olaf
> From: Olaf Hering [mailto:olaf@aepfle.de] > Subject: Re: [Xen-devel] Pointed questions re Xen memory overcommit > > On Mon, Feb 27, Dan Magenheimer wrote: > > > > From: Olaf Hering [mailto:olaf@aepfle.de] > > > To me memory overcommit means swapping, which is what xenpaging does: > > > turn the whole guest gfn range into some sort of virtual memory, > > > transparent to the guest. > > > > > > xenpaging is the red emergency knob to free some host memory without > > > caring about the actual memory constraints within the paged guests. > > > > Sure, but the whole point of increasing RAM in one or more guests > > is to increase performance, and if overcommitting *always* means > > swapping, why would anyone use it? > > The usage patterns depend on the goal. As I wrote, swapping frees host > memory so it can be used for something else, like starting yet another > guest on the host.OK, I suppose xenpaging by itself could be useful in a situation such as: 1) A failover occurs from machine A that has lots of RAM to machine B that has much less RAM, and even horribly bad performance is better than total service interruption. 2) All currently running VMs have been ballooned down "far enough" and either have no swap device or insufficiently-sized swap devices, Xen simply has no more free space, and horrible performance is acceptable. The historical problem with "hypervisor-based-swapping" solutions such as xenpaging is that it is impossible to ensure that "horribly bad performance" doesn''t start occurring under "normal" circumstances specifically because (as Tim indirectly concurs below), policies driven by heuristics and external inference (i.e. dom0 trying to guess how much memory every domU "needs") just don''t work. As a result, VMware customers outside of some very specific domains (domains possibly overlapping with Snowflock?) will tell you that "memory overcommit sucks and so we turn it off". Which is why I raised the question "why are we doing this?"... If the answer is "Snowflock customers will benefit from it", that''s fine. If the answer is "to handle disastrous failover situations", that''s fine too. And I suppose if the answer is "so that VMware can''t claim to have a feature that Xen doesn''t have (even if almost nobody uses it)", I suppose that''s fine too. I''m mostly curious because I''ve spent the last four years trying to solve this problem in a more intelligent way and am wondering if the "old way" has improved, or is still just the old way but mostly warmed over for Xen. And, admittedly, a bit jealous because there''s apparently so much effort going into the "old way" and not toward "a better way". Dan> <following pasted from earlier in thread> > From: Tim Deegan [mailto:tim@xen.org] > Subject: Re: Pointed questions re Xen memory overcommit > > At 12:53 -0800 on 24 Feb (1330088020), Dan Magenheimer wrote: > > 3) Is there new evidence that a host-based-policy-driven memory > > balancer works sufficiently well on one system, or for > > multiple hosts, or for a data center? > > That, I think, is an open research question.
>> From: Olaf Hering [mailto:olaf@aepfle.de] >> Subject: Re: [Xen-devel] Pointed questions re Xen memory overcommit >> >> On Mon, Feb 27, Dan Magenheimer wrote: >> >> > > From: Olaf Hering [mailto:olaf@aepfle.de] >> > > To me memory overcommit means swapping, which is what xenpaging >> does: >> > > turn the whole guest gfn range into some sort of virtual memory, >> > > transparent to the guest. >> > > >> > > xenpaging is the red emergency knob to free some host memory without >> > > caring about the actual memory constraints within the paged guests. >> > >> > Sure, but the whole point of increasing RAM in one or more guests >> > is to increase performance, and if overcommitting *always* means >> > swapping, why would anyone use it? >> >> The usage patterns depend on the goal. As I wrote, swapping frees host >> memory so it can be used for something else, like starting yet another >> guest on the host. > > OK, I suppose xenpaging by itself could be useful in a situation such as: > > 1) A failover occurs from machine A that has lots of RAM to machine B > that has much less RAM, and even horribly bad performance is better > than total service interruption. > 2) All currently running VMs have been ballooned down "far enough" > and either have no swap device or insufficiently-sized swap devices, > Xen simply has no more free space, and horrible performance is > acceptable. > > The historical problem with "hypervisor-based-swapping" solutions such > as xenpaging is that it is impossible to ensure that "horribly bad > performance" doesn''t start occurring under "normal" circumstances > specifically because (as Tim indirectly concurs below), policies > driven by heuristics and external inference (i.e. dom0 trying > to guess how much memory every domU "needs") just don''t work. > > As a result, VMware customers outside of some very specific domains > (domains possibly overlapping with Snowflock?) will tell you > that "memory overcommit sucks and so we turn it off". > > Which is why I raised the question "why are we doing this?"... > If the answer is "Snowflock customers will benefit from it",How come SnowFlock crept in here? :) I can unequivocally assert there is no such thing as "SnowFlock customers". You have to keep in mind that paging is 1. not bad to have and 2. powerful and generic, and 3. a far more generic mechanism to populating on demand, than what is labeled in the hypervisor as "populate-on-demand". Re 2. you could implement a balloon using a pager -- or you could implement a version of ramster by putting the page file on a fuse fs with compression turned on. Not that you would want to, just to prove a point. And re 3. not that there''s anything wrong with PoD, but it has several assumptions baked in about being a temporary balloon replacement. I predict that once 32 bit hypervisor and shadow mode are phased out, PoD will also go away, as it will be a "simple" sub-case of paging. Olaf Hering from SuSe invested significant time and effort in getting paging to where it is, so you also have to add to the list whatever his/their motivations are. Andres> that''s fine. If the answer is "to handle disastrous failover > situations", that''s fine too. And I suppose if the answer is > "so that VMware can''t claim to have a feature that Xen doesn''t > have (even if almost nobody uses it)", I suppose that''s fine too. > > I''m mostly curious because I''ve spent the last four years > trying to solve this problem in a more intelligent way > and am wondering if the "old way" has improved, or is > still just the old way but mostly warmed over for Xen. > And, admittedly, a bit jealous because there''s apparently > so much effort going into the "old way" and not toward > "a better way". > > Dan > >> <following pasted from earlier in thread> >> From: Tim Deegan [mailto:tim@xen.org] >> Subject: Re: Pointed questions re Xen memory overcommit >> >> At 12:53 -0800 on 24 Feb (1330088020), Dan Magenheimer wrote: >> > 3) Is there new evidence that a host-based-policy-driven memory >> > balancer works sufficiently well on one system, or for >> > multiple hosts, or for a data center? >> >> That, I think, is an open research question. >
> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > > OK, I suppose xenpaging by itself could be useful in a situation such as: > > > > 1) A failover occurs from machine A that has lots of RAM to machine B > > that has much less RAM, and even horribly bad performance is better > > than total service interruption. > > 2) All currently running VMs have been ballooned down "far enough" > > and either have no swap device or insufficiently-sized swap devices, > > Xen simply has no more free space, and horrible performance is > > acceptable. > > > > The historical problem with "hypervisor-based-swapping" solutions such > > as xenpaging is that it is impossible to ensure that "horribly bad > > performance" doesn''t start occurring under "normal" circumstances > > specifically because (as Tim indirectly concurs below), policies > > driven by heuristics and external inference (i.e. dom0 trying > > to guess how much memory every domU "needs") just don''t work. > > > > As a result, VMware customers outside of some very specific domains > > (domains possibly overlapping with Snowflock?) will tell you > > that "memory overcommit sucks and so we turn it off". > > > > Which is why I raised the question "why are we doing this?"... > > If the answer is "Snowflock customers will benefit from it", > > How come SnowFlock crept in here? :) > > I can unequivocally assert there is no such thing as "SnowFlock customers".Sorry, no ill will intended. Tim (I think) earlier in this thread suggested that page-sharing might benefit snowflock-like workloads.> Olaf Hering from SuSe invested significant time and effort in getting > paging to where it is, so you also have to add to the list whatever > his/their motivations are.Thus my curiosity... if Novell has some super-secret plans that we aren''t privileged to know, that''s fine. Otherwise, I was trying to understand the motivations.> You have to keep in mind that paging is 1. not bad to have and 2. powerful > and generic, and 3. a far more generic mechanism to populating on demand, > than what is labeled in the hypervisor as "populate-on-demand". > > Re 2. you could implement a balloon using a pager -- or you could > implement a version of ramster by putting the page file on a fuse fs with > compression turned on. Not that you would want to, just to prove a point. > > And re 3. not that there''s anything wrong with PoD, but it has several > assumptions baked in about being a temporary balloon replacement. I > predict that once 32 bit hypervisor and shadow mode are phased out, PoD > will also go away, as it will be a "simple" sub-case of paging.I *think* we are all working on the same goal of "reduce RAM as a bottleneck *without* a big performance hit". With xenpaging, I fear Xen customers will be excited about "reduce/eliminate RAM as a bottleneck" and then be surprised when there IS a big performance hit. I also fear that, with current policy technology, it will be impossible to draw any sane line to implement "I want to reduce/eliminate RAM as a bottleneck *as much as possible* WITHOUT a big performance hit". In other words, I am hoping to avoid repeating all the same mistakes that VMware has already gone through and getting all the same results VMware customers have already gone through, e.g. "memory overcommit sucks so just turn it off." This would be IMHO the classic definition of insanity. http://www.quotationspage.com/quote/26032.html As for PoD and paging, if adding xenpaging or replacing PoD with xenpaging ensures that a guest continues to run in situations where PoD would have caused the guest to crash, great! But if xenpaging makes performance suck where PoD was doing just fine, see above. (And, P.S., apologies to Jan who HAS invested time and energy into tmem.)
On Wed, Feb 29, 2012 at 5:26 PM, Dan Magenheimer <dan.magenheimer@oracle.com> wrote:>> From: Olaf Hering [mailto:olaf@aepfle.de] >> Subject: Re: [Xen-devel] Pointed questions re Xen memory overcommit >> >> On Mon, Feb 27, Dan Magenheimer wrote: >> >> > > From: Olaf Hering [mailto:olaf@aepfle.de] >> > > To me memory overcommit means swapping, which is what xenpaging does: >> > > turn the whole guest gfn range into some sort of virtual memory, >> > > transparent to the guest. >> > > >> > > xenpaging is the red emergency knob to free some host memory without >> > > caring about the actual memory constraints within the paged guests. >> > >> > Sure, but the whole point of increasing RAM in one or more guests >> > is to increase performance, and if overcommitting *always* means >> > swapping, why would anyone use it? >> >> The usage patterns depend on the goal. As I wrote, swapping frees host >> memory so it can be used for something else, like starting yet another >> guest on the host. > > OK, I suppose xenpaging by itself could be useful in a situation such as: > > 1) A failover occurs from machine A that has lots of RAM to machine B > that has much less RAM, and even horribly bad performance is better > than total service interruption. > 2) All currently running VMs have been ballooned down "far enough" > and either have no swap device or insufficiently-sized swap devices, > Xen simply has no more free space, and horrible performance is > acceptable. > > The historical problem with "hypervisor-based-swapping" solutions such > as xenpaging is that it is impossible to ensure that "horribly bad > performance" doesn''t start occurring under "normal" circumstances > specifically because (as Tim indirectly concurs below), policies > driven by heuristics and external inference (i.e. dom0 trying > to guess how much memory every domU "needs") just don''t work. > > As a result, VMware customers outside of some very specific domains > (domains possibly overlapping with Snowflock?) will tell you > that "memory overcommit sucks and so we turn it off". > > Which is why I raised the question "why are we doing this?"... > If the answer is "Snowflock customers will benefit from it", > that''s fine. If the answer is "to handle disastrous failover > situations", that''s fine too. And I suppose if the answer is > "so that VMware can''t claim to have a feature that Xen doesn''t > have (even if almost nobody uses it)", I suppose that''s fine too. > > I''m mostly curious because I''ve spent the last four years > trying to solve this problem in a more intelligent way > and am wondering if the "old way" has improved, or is > still just the old way but mostly warmed over for Xen. > And, admittedly, a bit jealous because there''s apparently > so much effort going into the "old way" and not toward > "a better way".Is it possible to use the "better way" in Windows? If not, then those who want to support Windows guests are stuck with the old way, until MS discovers / "re-invents" tmem. :-) (Maybe you should give a your tmem talk at MS research? Or try to give it again if you already have?) Even then there''d still be a market for supporting all those pre-tmem OS''s for quite a while. Apart from that, here''s my perspective: * Having page sharing is good, because of potential memory savings in VDI deployments, but also because of potential for fun things like VM fork, &c * Having page sharing requires you to have an emergency plan if suddenly all the VMs write to their pages and un-share them. Hypervisor paging is the only reasonable option here. If it makes you feel any better, one of the suggested default policies for "what to do with all the memory sharing generates" was "put it into a tmem pool". :-) That, and as we try to hammer out what the implications of default policies are, the semantic gap between balloon drivers, paging, and sharing, is rearing a very ugly head -- I would much rather just hand the paging/ballooning stuff over to tmem. -George
On Wed, Feb 29, Dan Magenheimer wrote:> I''m mostly curious because I''ve spent the last four years > trying to solve this problem in a more intelligent way > and am wondering if the "old way" has improved, or is > still just the old way but mostly warmed over for Xen. > And, admittedly, a bit jealous because there''s apparently > so much effort going into the "old way" and not toward > "a better way".The checkbox thing, and because its fun, is certainly a big part of it. I agree that a in-guest solution where all capable guests can give away pages to others is best because there is no IO overhead. And guests know best what pages can be put into the pool. Then I think that not all guests are capable enough to contribute their free pages. And if host memory is low, then things like swapping can solve the problem at hand. Olaf
> > As a result, VMware customers outside of some very specific domains > > (domains possibly overlapping with Snowflock?) will tell you > > that "memory overcommit sucks and so we turn it off". > > > > Which is why I raised the question "why are we doing this?"... > > If the answer is "Snowflock customers will benefit from it", > > that''s fine. If the answer is "to handle disastrous failover > > situations", that''s fine too. And I suppose if the answer is > > "so that VMware can''t claim to have a feature that Xen doesn''t > > have (even if almost nobody uses it)", I suppose that''s fine too. > > > > I''m mostly curious because I''ve spent the last four years > > trying to solve this problem in a more intelligent way > > and am wondering if the "old way" has improved, or is > > still just the old way but mostly warmed over for Xen. > > And, admittedly, a bit jealous because there''s apparently > > so much effort going into the "old way" and not toward > > "a better way". > > Is it possible to use the "better way" in Windows?Ian Pratt at one point suggested it might be possible to use some binary modification techniques to put the tmem hooks into Windows, though I am dubious. Lacking that, it would require source changes done at MS.> If not, then those > who want to support Windows guests are stuck with the old way, until > MS discovers / "re-invents" tmem. :-) (Maybe you should give a your > tmem talk at MS research? Or try to give it again if you already > have?)I have an invite (from KY) to visit MS but I think tmem will require broader support (community and corporate) before MS would seriously consider it. Oracle is not exactly MS''s best customer. :-}> Even then there''d still be a market for supporting all those > pre-tmem OS''s for quite a while.Yes, very true.> Apart from that, here''s my perspective: > * Having page sharing is good, because of potential memory savings in > VDI deployments, but also because of potential for fun things like VM > fork, &c > * Having page sharing requires you to have an emergency plan if > suddenly all the VMs write to their pages and un-share them. > Hypervisor paging is the only reasonable option here.Yes, agreed.> If it makes you feel any better, one of the suggested default policies > for "what to do with all the memory sharing generates" was "put it > into a tmem pool". :-) That, and as we try to hammer out what the > implications of default policies are, the semantic gap between balloon > drivers, paging, and sharing, is rearing a very ugly head -- I would > much rather just hand the paging/ballooning stuff over to tmem./me smiles. Yes, thanks it does make me feel better :-) I''m not sure putting raw memory into a tmem pool would do more good than just freeing it. Putting *data* into tmem is what makes it valuable, and I think it takes a guest OS to know how and when to put and get the data and (perhaps most importantly) when to flush it to ensure coherency. BUT, maybe this IS a possible new use of tmem that I just haven''t really considered and can''t see my way through because of focusing on the other uses for too long. So if I can help with talking this through, let me know. Thanks, Dan
> From: Olaf Hering [mailto:olaf@aepfle.de] > Subject: Re: [Xen-devel] Pointed questions re Xen memory overcommit > > On Wed, Feb 29, Dan Magenheimer wrote: > > > I''m mostly curious because I''ve spent the last four years > > trying to solve this problem in a more intelligent way > > and am wondering if the "old way" has improved, or is > > still just the old way but mostly warmed over for Xen. > > And, admittedly, a bit jealous because there''s apparently > > so much effort going into the "old way" and not toward > > "a better way". > > The checkbox thing, and because its fun, is certainly a big part of it.OK. I can relate to that! ;-)> I agree that a in-guest solution where all capable guests can give away > pages to others is best because there is no IO overhead. And guests know > best what pages can be put into the pool. > > Then I think that not all guests are capable enough to contribute their > free pages. And if host memory is low, then things like swapping can > solve the problem at hand.Yes, as George points out, there will always be legacy guests for which swapping can solve a problem. Dan
On Wed, Feb 29, Dan Magenheimer wrote:> > From: Olaf Hering [mailto:olaf@aepfle.de] > > Then I think that not all guests are capable enough to contribute their > > free pages. And if host memory is low, then things like swapping can > > solve the problem at hand. > > Yes, as George points out, there will always be legacy guests > for which swapping can solve a problem.So thats the motivation. Olaf
On Wed, Feb 29, 2012 at 8:26 PM, Dan Magenheimer <dan.magenheimer@oracle.com> wrote:>> If not, then those >> who want to support Windows guests are stuck with the old way, until >> MS discovers / "re-invents" tmem. :-) (Maybe you should give a your >> tmem talk at MS research? Or try to give it again if you already >> have?) > > I have an invite (from KY) to visit MS but I think tmem > will require broader support (community and corporate) before > MS would seriously consider it. Oracle is not exactly MS''s > best customer. :-}Well, of course. But IIUC, it''s one of the goals of MS Research to impact product teams, and incentive structure there is meant to make that happen. So if you can convince somebody / a team at MS Research that tmem is a clever cool new idea that can help Microsoft''s bottom line, then *they* should be the ones to shop around for broader support. In theory anyway. :-)> I''m not sure putting raw memory into a tmem pool would do > more good than just freeing it. Putting *data* into tmem > is what makes it valuable, and I think it takes a guest OS > to know how and when to put and get the data and (perhaps > most importantly) when to flush it to ensure coherency.The thing with the gains from sharing is that you can''t really free it. Suppose you have two 2GiB VMs, of which 1GiB is identical at some point in time. That means the 2 VMS use only 3GiB between them, and you have an extra 1GiB of RAM. However, unlike ram which is freed by ballooning, this RAM isn''t stable: at any point in time, either of the VMs might write to the shared pages, requiring 2 VMs that need 2GiB of RAM each again. If this happens, you will need to either: * Page out 0.5GiB from each VM (*really* bad for performance), or * Take the 1GiB of RAM back somehow. In this situation, having that ram in a tmem pool that the guests can use (or perhaps, dom0 for file caches or whatever) is the best option. I forget the name you had for the different types, but wasn''t there a type of tmem where you tell the guest, "Feel free to store something here, but it might not be here when you ask for it again"? That''s just the kind of way to use this RAM -- then the hypervisor system can just yank it from the tmem pool if guests start to un-share pages. The other option would be to allow the guests to decrease their balloon size, allowing them to use the freed memory themselves; and then if a lot of things get unshared, just inflate the balloons again. This is also a decent option, except that due to the semantic gap, we can''t guarantee that the balloon won''t end up grabbing shared pages -- which doesn''t actually free up any more memory. A really *bad* option, IMHO, is to start a 3rd guest with that 1GiB of freed RAM -- unless you can guarantee that the balloon driver in all of them will be able to react to unsharing events. Anyway, that''s what I meant by using a tmem pool -- does that make sense? Have I misunderstood something about tmem''s capabilities? -George
> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com] > Subject: Re: [Xen-devel] Pointed questions re Xen memory overcommit > > Well, of course. But IIUC, it''s one of the goals of MS Research to > impact product teams, and incentive structure there is meant to make > that happen. So if you can convince somebody / a team at MS Research > that tmem is a clever cool new idea that can help Microsoft''s bottom > line, then *they* should be the ones to shop around for broader > support. > > In theory anyway. :-)/Me worked in corporate research for 10 years and agrees that theory and reality are very different for the above. That said, I am willing to play a supporting role if there are efforts to convince MS of the value of tmem, just not willing to be the lone salesman.> > I''m not sure putting raw memory into a tmem pool would do > > more good than just freeing it. Putting *data* into tmem > > is what makes it valuable, and I think it takes a guest OS > > to know how and when to put and get the data and (perhaps > > most importantly) when to flush it to ensure coherency. > > The thing with the gains from sharing is that you can''t really free > it. Suppose you have two 2GiB VMs, of which 1GiB is identical at some > point in time. That means the 2 VMS use only 3GiB between them, and > you have an extra 1GiB of RAM. However, unlike ram which is freed by > ballooning, this RAM isn''t stable: at any point in time, either of the > VMs might write to the shared pages, requiring 2 VMs that need 2GiB of > RAM each again. If this happens, you will need to either: > * Page out 0.5GiB from each VM (*really* bad for performance), or > * Take the 1GiB of RAM back somehow. > > In this situation, having that ram in a tmem pool that the guests can > use (or perhaps, dom0 for file caches or whatever) is the best option. > I forget the name you had for the different types, but wasn''t there a > type of tmem where you tell the guest, "Feel free to store something > here, but it might not be here when you ask for it again"? That''s > just the kind of way to use this RAM -- then the hypervisor system can > just yank it from the tmem pool if guests start to un-share pages. > > The other option would be to allow the guests to decrease their > balloon size, allowing them to use the freed memory themselves; and > then if a lot of things get unshared, just inflate the balloons again. > This is also a decent option, except that due to the semantic gap, we > can''t guarantee that the balloon won''t end up grabbing shared pages -- > which doesn''t actually free up any more memory. > > A really *bad* option, IMHO, is to start a 3rd guest with that 1GiB of > freed RAM -- unless you can guarantee that the balloon driver in all > of them will be able to react to unsharing events. > > Anyway, that''s what I meant by using a tmem pool -- does that make > sense? Have I misunderstood something about tmem''s capabilities?One thing I think you may be missing is that pages in an ephemeral pool are next in line after purely free pages, i.e. are automatically freed in FIFO order if there is a guest demanding memory via ballooning or if the tools are creating a new guest or, presumably, if a shared page needs to be split/cow''ed. IOW, if you have a mixed environment where some guests are unknowingly using page-sharing and others are using tmem, this should (in theory) already work. See "Memory allocation interdependency" in docs/misc/tmem-internals.html (or http://oss.oracle.com/projects/tmem/dist/documentation/internals/xen4-internals-v01.html ) If you are talking about a "pure" environment where no guests are tmem-aware, putting pageframes-recovered-due-to-sharing in an ephemeral tmem pool isn''t AFAICT any different than just freeing them. At least with the current policy and implementation, the results will be the same. But maybe I am missing something important in your proposal. In case anyone has time to read it, the following may be more interesting with all of the "semantic gap" issues fresh in your mind. (Note some of the code links are very out-of-date.) http://oss.oracle.com/projects/tmem/ . And for a more Linux-centric overview: http://lwn.net/Articles/454795/ Dan